Forum:WPWeb:Indexing methodology

WookieeProjects > WookieeProject Web > WPWeb:Indexing methodology

To work efficiently on an indexing project, we need a reliable methodology, with traceability as it core, so that anyone is aware of the current progress and can help at any time, virtually eliminating the risk of people going over and over the same content, not knowing what was done in the past. It will still requires some communication between projects participant though.

The methodology, or workflow, is organized around two phases:

Phase I: Database indexing, aimed at create a sorted list of links
Phase II: Extraction indexing, aimed at adding links to relevant articles sorting list ("Sources", "External links", etc.)

Note that this is meant to be flexible: anyone can focus on either Phase I or Phase II of an indexing project at any moment, but keep in mind that a link should still be introduced in Phase I before being processed in Phase II. Also, indexing projects aren't meant to impose anything on global Wookieepedia users: they can add links to pages as usual without the need to refer to an indexing project.

You don't need to be a project active participant to add links to help indexing if you want to!

Phase I: Database indexing

The goal of this phase is to provide on a single page a database/repository of all the links relevant to Wookieepedia from a designated website.

Get started:

To start, create a forum post under "Indexing projects" and start to detail the scope, specificity and tasks of the project. Refer to other projects to see how they are organized. This page will allow you to organize tasks and coordinate with other editors.
Make sure the website you're working with is supported by a template. If not, either create one or request it to be created in a "Tech support" forum post.
If the website you want to index is already supported by a template which has already been used for some time, you must request a datadump of all the instances of this template. This will allow you for an easy access of all currently known links on Wookieepedia, and you'll a gain a lot of time. You can also request for a "/Archive" to be made, removing the need for archivedate in every instance of the template.
If you're creating a new template, you must migrate all relevant links to this new template. While recommended, it is not necessary to do so right away, as it can be done concurrently with Phase I. Use Link Search to identified concerned urls, and be mindful that this tool is sensitive to "http" vs "https", and you will need to search both if you're looking for older websites, as https started to be globally used around 2018.

Sorting:

The repository itself should go on a subpage (/repository) of the forum.
All links must be contained in a citation template. This is to allow easy copy.
If sufficient data is available, the list can be sorted in chronological order with a date affixed (either on a wikinote or in plain text), otherwise alphabetical order is sufficient. Chronological list should use at least yearly subheading to facilitate navigation, but for big website with several daily publications, monthly subheading are recommended.
In the same spirit, big website should be organized between subheading based on the url structure (domain, subdomain, arborescence).

Indexing:

To add new link to the list you must first find them through exploratory means by looking for them on live websites and through a web archives. Live status non-withstanding, always check web archives (archive.org and archive.today) for deleted content.
It is recommended to work in chronological order: find when a website was created and work from there until now. This will allow for a more thorough search, and will help to better organize the work, as you can declare time period complete and thus indicate potential participants to search elsewhere. However, chronological indexing isn't a fixed rule, and users are welcome to add links from any point in time.
Try to find internal search engines and archives, and don't hesitate to navigate around to understand the site structure.
Refer to our tutorial on how to best to use web archives.

Phase II: Extraction indexing

The goal of this second and final phase is to identify and extract terms from a web page to populate indexing lists on Wookieepedia's articles, such as Sources, External links, and even Appearances (ex: for online short stories) sections.

Make sure the project page inform which sections are concerned. Some websites can be used for both Sources and External links, depending on the content. The Layout Guide, and its OOU equivalent provide some guidelines.
Select a page listed in the repository and start to identify relevant terms, such as characters, documents, events, authors, etc. In other words: anything notable enough to deserve an article. Then proceed to populate Sources, External links and/or Appearances sections in these articles, while following the LG guidelines.
Once you're certain the page is fully indexed on our articles, strike it from the repository using <s></s>. Do not delete the link.
Repeat until depletion of the repository.

If you still have any doubt about how to do things properly, you reach out to other participants through the project page or on Discord.