Html links in dumped formatted output from links 2.21 browser

DanielFetchinson

Links 2.21 is a fantastic text based browser which is able to output formatted text from URL's.

links -dump "https://example.com/page.html" > output.txt

As is, output.txt contains all links as text only, so for example if there is a link in the html source like this:

<a href="/some/link/example.html">Some Text</a>

then output.txt will simply have "Some Text" but nothing from the href attribute.

What I'd like to do is have the info from links included in the output for example like this:

[Some Text|https://example.com/some/link/example.html]

or anything similar. Is this possible? The browser clearly has this info because when it renders the page, the links are "clickable" (actually selectable by keys in text mode) and it correctly follows all links.

Or is there another way of converting a web page to plain text but including all the info about <a ...> tags in a structured way?

Note that I'm fully aware of tons of tools to extract links from web pages and tons of tools to convert web pages to text, but nothing really which does both at the same time.

Bavi_H

If it is acceptable to have the link addresses listed at the end of the dump you can do:

links -html-numbered-links 1 -dump "https://example.com/"

The result will look something like this

                                 Example Domain

   This domain is for use in illustrative examples in documents. You may use
   this domain in literature without prior coordination or asking for
   permission.

   [1]More information...

Links:
1. https://www.iana.org/domains/example

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to output list of HTML links from an array using Pug?

Convert JSON output from API to list of HTML links

index.html does not include links on browser

How to scrape the links, titles, authors and timestamps for the 10 articles here? Soup output looks different from browser

Scraping HTML from array of links

Open Youtube links in external videoplayer from browser

How to install the "Links 2" browser in Lubuntu?

Use internal links in RMarkdown HTML output

How to output a list of links from the following code

Remove spaces between links from html table if links removed

Converting HTML links to PHP links?

regex cut css and js links from html

Extract text and links from unbalanced html table

Remove all links from DOM html with PHP

Remove html hyper links from URL in Outlook

How can I extract links from HTML?

Change links type from .php?id= to html

How to remove all links from an html content

Angular directive for <a href> links from compiled html

How to exclude html links from WordPress search?

BASH extract links from youtube html file

How to extract links from HTML using BeautifulSoup?

Extract a specific domain links from HTML of a website

Kentico Tag output but not links

Set default browser used to open links from SeaMonkey

Open links from Thunderbird with non-default browser

Use Firefox as default browser, but open links in Chrome from a specific program

How to prevent Facebook in app browser from opening my website links?

Corona Labs - Open up links from newWebView in device's browser