Content-independent generated ids for Headings and Images

vmassol · February 28, 2024, 12:50pm

Hi devs,

Context

Right now we use this strategy:

XWiki automatically generates ids based on the content. For example for heading, the id is based on the heading content. For images, the id is based on the image name.
Pros: Simple to guess and use if you know the algorithm (no long and non-understandable ids like UUIDs in the content, e.g. 123e4567-e89b-42d3-a456-556642440000)
Cons: If the heading content is modified or the image name is changed, the ids are modified and references to them will break
The current strategy is that if you need a fixed/permanent id, you use the {{id}} macro. For example:
```
{{id name="myheadingid"/}}
= Some heading =
...
{{id name="myimageid"/}}image:someimage.png
```

Problem

Some user has raised to me that they’d prefer that we generate ids that do not use the content in the id so that they can rename images and not break references to them (using the {{reference/}} macro for example).

It’s true that the current strategy is not that great in this regards and will tend to break references or URLs used as permalinks.

Proposal

We could reverse the strategy (possibly with a config option to act as backward compat for admins who want to use the old strategy):

XWiki automatically generates ids NOT based on the content (a UUID for ex, e.g. 123e4567-e89b-42d3-a456-556642440000),
Pros: Do not break if the heading content is changed or the image is renamed
Cons: Cannot be guessed and you need the UI to allow you to get the id easily. For images, we have the lightbox feature. For headings, we’d need to implement some mouse hover to make an icon link appear that could be copied, something like Build Scan® | Develocity This would be useful even without a strategy change BTW.

If you need a simple id, you could use the {{id}} macro. For example:

{{id name="myheadingid"/}}
= Some heading =
...
{{id name="myimageid"/}}image:someimage.png

Additionally I’ve also created Loading... to make it easier to reference an image while in WYSIWYG edit mode.

WDYT?

Thanks

ben.megson · February 28, 2024, 1:13pm

For those of us who are a little more security paranoid, there is another Con to the current approach in my opinion.

While avoidable, users could accidentally expose sensitive text in the URL to other pages.

+1 for the proposal from me

MichaelHamann · February 28, 2024, 1:30pm

I see the value of content-independent ids. However, what I don’t see in this proposal is what you propose how they should be represented in XWiki syntax and where they should be generated. Do you propose to store an id attribute on all headings and images? What about standalone images, how do we store ids there?

The syntax you propose here to set ids doesn’t set an id on the elements next to them, it instead introduces another HTML element with an id that is completely unrelated to the image/heading. The proper way to set an id on an image or a heading is the following:

(% id="myheadingid" %)
= Some heading =

[[image:someimage.png||id="myimageid"]]

In particular for images, setting the id could also be easily supported in the image edit dialog in the WYSIWYG editor, where we could also display the automatically generated id.

So far, our approach in XWiki is to have internal ids that are derived from the user-visible name, like the page name that follows the page title, and to rename all references when the name is changed. For images, this would mean that whenever an image is renamed, we should have an automatic refactoring to adjust all reference macros and all anchors that refer to the id of the changed image. The same could be supported for headings: when saving a document, we perform a diff to understand if any heading ids were changed, trying to map old headings to new headings, and then adjust all links accordingly.

What you’re proposing here is a breaking change: anchors can be used both in internal and external links to a page. Changing how they are generated breaks all links that contain an anchor. From my understanding, your proposal is to break all of them by default. If yes, I think this needs to be a vote.

vmassol · February 28, 2024, 2:12pm

This is a very good point and probably the main reason why we implemented ids the way they are now. I hadn’t thought of the implementation and indeed we’d need to store the generated ids somewhere and that’s a big pain.

What you’re mentioning is the proper way in HTML for sure but it’s not what we’ve been pushing for our users and documenting. We’ve always documented the usage of the id macro (and that’s why we created it btw and why it’s bundled in XS). AFAIK there’s no page on xwiki.org where we document/recommend using parameters for setting anchor ids. I’m not against doing that but we’d need to discuss what’s the best practice, document it and update the doc of https://extensions.xwiki.org/xwiki/bin/view/Extension/Id%20Macro too.

Yes, that would be a nice improvement. Same when inserting headings (but we don’t have a dialog box for that, so we’d need to introduce a way, like right-clicking a heading and using a menu entry).

You mean, same as links, store somewhere a list of places that reference a heading or an image and have extensions contribute to that (like the reference macro mentioned). Yes doable, not easy but doable.

tmortagne · February 28, 2024, 2:16pm

I’m afraid this proposal is way too light, it’s not that simple. I assume your proposal is not to generate a new UUID every time the content is parsed, as it’s obviously not going to cover your need: each time a document is out of the cache (for example a save or simply a full cache and a document not modified since a while) you will end up with new ids for the headers next time it’s loaded.

One pro of generating the ids based on the content is exactly that: it’s stable and you always get the same every time the content is parsed.

MichaelHamann · February 28, 2024, 2:19pm

Regarding the storage of random ids: what about headings that are “dynamically generated”, e.g., as the output of a Velocity macro - how would their ids be computed? With the current mechanism, you have stable ids as long as the headings are the same. With random ids, they would be different on every page load and thus unusable.

Some more thoughts:

The include and display macros use ids to include a section. At the moment, when you have a section named “Summary”, you can display that section with {{display page="Page1/Page2" section="HSummary" /}}. You could easily create a Velocity script to include all “Summary” sections of all pages in a space. With the new random ids, this won’t be possible anymore.

Seeing a link to “https://extensions.xwiki.org/xwiki/bin/view/Extension/Display%20Macro#HExamples” seems much more natural than a link with a random id.

An idea that is maybe simpler to implement: CKEditor could generate and store a random id whenever a new heading or image is inserted. Further, it could offer ways to easily edit the id with buttons to generate either a random or a content-derived id. Configuration options could allow controlling both what should happen when no id is set and which buttons are available.

Another option to avoid breaking ids would be to generate ids based on the content, but to store them in the page content, such that they won’t change when the heading or image is renamed.

Regarding sensitive names, note that attachment names are also exposed when you link to or display an attachment of another page. To avoid this, we would need to support naming strategies for attachments that automatically replace the attachment name by a random name.

vmassol · February 28, 2024, 2:19pm

Yes definitely, I had missed the need to store the id if it’s not generated based on the content (thanks to Michael and you to mention it).

Initially I just wanted to send a brainstorming about the problem but while writing it, I thought there could be a simple solution but I haven’t thought it through well enough…

vmassol · February 28, 2024, 2:21pm

Yes, SEO is another good cons of generated ids not based on content.

vmassol · February 28, 2024, 2:25pm

Yes, it’s a good idea. With maybe some admin options to control whether an id is always stored by default in the page content for headings/images or if the user would need to check a box to store it (i.e. on demand).

tmortagne · February 28, 2024, 2:34pm

Yes, I agree that in the case of the WYSIWYG, the easiest to cover the need is probably to propose the id (generated like now by default, but I guess we could imagine alternative configurable id generators, the same way we have configurable page name validators) and store it.

Simpel · February 28, 2024, 6:59pm

That would be a nice feature for todays strategy too. I always open up the dev console to get the right id or if there is a toc inserted click and jump to the heading and copy it from the adress bar.

In principle the ids for headings are guessable more or less. Headings in a language with Umlauts are a hard guess sometimes.

vmassol · March 1, 2024, 2:17pm

I’ve opened Getting heading links and anchor ids more easily