How to fix canonical URLs and links in your pre-fork posts
As you can see here, I have some links in my blogs written before the hive-fork pointing to steemit.com. It's time to replace them all.
![steemit link in my blog](https://images.hive.blog/DQmfDpnHj9kXEqo2ZXNSAkaPbFb4svFqWXLyUEdEuniqd3e/steemit%20link%20in%20my%20blog)
Almost all blog posts written before the fork are written with apps that are not included in [hivescript](https://www.npmjs.com/package/@hivechain/hivescript) and lead to problems with canonical URLs:
As it can be seen here, my old post [update for beem: first release for HF 21](https://peakd.com/beem/@holger80/update-for-beem-first-release-for-hf-21) result in different canonical URLs on different front-ends. This is then handled as duplicated content by the search engines.
The post was written through palnet. As there is no entry for palnet in [hivescript](https://www.npmjs.com/package/@hivechain/hivescript), the front-ends to not know how to build a proper canonical URL:
![wrong canonical url](https://images.hive.blog/DQmXe1t2QwJtizzVXDpG89GVNxyYAorFu6Tjj5LRnXkEJss/wrong%20canonical%20url)
# Fixing The mess
Fixing means:
* replacing all steemit, steempeak, ... links with relative links
* setting `canonical_url` for each post written before 2020-03-20, to fix canonical URLs.
### Small update
The script uses now relative links, when found a link to steemit.com ..., it will be replaced by a relative link. A relative link looks like: `[holger80](/@holger80)` and `[this post](/hive-139531/@holger80/how-to-fix-canonical-urls-and-links-in-your-pre-fork-posts)`
### Small update 2
There are now three boolean parameters, which can be used to set the following:
* `replace_steemit_links`: when True, steemit, ... links will be replaced
* `use_relative_links`: when True, relative links will be used (starting with `/`)
* `add_canonical_url`: When True, a canonical_url is added to the metadata
### Small update 3
It is now possible to use the same script for fixing the canonical links on STEEM for all written post before the fork.
When you want to use the script on STEEM:
* set `target_blockchain = "steem"`
When you want to use the script on HIVE:
* set `target_blockchain = "hive"`
### Python code
The following script is using beem and will do exactly this.
beem can be installed by
```
pip install beem
```
or
```
conda install beem
```
Store the following as `fix_canonical_urls_hive.py`:
```
#!/usr/bin/python
from beem import Hive, Steem
from beem.utils import addTzInfo
from beem.account import Account
from beem.comment import Comment
from beem.nodelist import NodeList
import time
from datetime import datetime
import getpass
if __name__ == "__main__":
# Parameter
canonical_url = "https://hive.blog"
replace_steemit_links = True
use_relative_links = True
add_canonical_url = True
target_blockchain = "hive" # can be hive or steem
# ----
# at least one option must be true
assert replace_steemit_links or add_canonical_url
assert target_blockchain in ["hive", "steem"]
# Canonical url must not end with /
if canonical_url[-1] == "/":
canonical_url = canonical_url[:-1]
nodelist = NodeList()
nodelist.update_nodes()
test_run_answer = input("Do a test run? [y/n]")
if test_run_answer in ["y", "Y", "yes"]:
test_run = True
print("Doing a test run on %s!" % target_blockchain)
else:
test_run = False
if test_run:
if target_blockchain == "hive":
blockchain_instance= Hive(node=nodelist.get_hive_nodes())
else:
blockchain_instance= Steem(node="https://api.steemit.com")
else:
wif = getpass.getpass(prompt='Enter your posting key for %s.' % target_blockchain)
if target_blockchain == "hive":
blockchain_instance = Hive(node=nodelist.get_hive_nodes(), keys=[wif])
else:
blockchain_instance = Steem(node="https://api.steemit.com", keys=[wif])
if target_blockchain == "hive":
assert blockchain_instance.is_hive
else:
assert blockchain_instance.is_steem
account = input("Account name =")
account = Account(account, blockchain_instance=blockchain_instance)
if add_canonical_url:
print("Start to fix canonical_url on %s for %s" % (target_blockchain, account["name"]))
if replace_steemit_links:
print("Start to replace steemit links on %s for %s" % (target_blockchain, account["name"]))
apps_with_cannonical_url = ["hiveblog", "peakd", "esteem", "steempress", "actifit",
"travelfeed", "3speak", "steemstem", "leofinance", "clicktrackprofit",
"dtube"]
hive_fork_date = addTzInfo(datetime(2020, 3, 20, 14, 0, 0))
blog_count = 0
expected_count = 100
while expected_count - blog_count == 100:
for blog in account.get_blog_entries(start_entry_id=blog_count, raw_data=False):
blog_count += 1
if blog["parent_author"] != "":
continue
if blog["author"] != account["name"]:
continue
if "canonical_url" in blog.json_metadata and canonical_url in blog.json_metadata["canonical_url"]:
continue
if "app" in blog.json_metadata and blog.json_metadata["app"].split("/")[0] in apps_with_cannonical_url and target_blockchain == "hive":
continue
if blog["created"] > hive_fork_date:
continue
body = blog.body
if "links" in blog.json_metadata:
links = blog.json_metadata["links"]
else:
links = None
if "links" in blog.json_metadata and replace_steemit_links:
for link in blog.json_metadata["links"]:
if "steemit.com" in link or "steempeak.com" in link or "busy.org" in link or "partiko.app" in link:
authorperm = link.split("@")
acc = None
post = None
new_link = ""
if len(authorperm) == 1:
continue
authorperm = authorperm[1]
if authorperm.find("/") == -1:
try:
acc = Account(authorperm, blockchain_instance=blockchain_instance)
if use_relative_links:
new_link = "/@" + acc["name"]
else:
new_link = canonical_url + "/@" + acc["name"]
except:
continue
else:
try:
post = Comment(authorperm, blockchain_instance=blockchain_instance)
if use_relative_links:
new_link = "/" + post.category + "/" + post.authorperm
else:
new_link = canonical_url + "/" + post.category + "/" + post.authorperm
except:
continue
if new_link != "":
for i in range(len(links)):
if links[i] == link:
links[i] = new_link
body = body.replace(link, new_link)
print("Replace %s with %s" % (link, new_link))
json_metadata = blog.json_metadata or {}
if links is not None and replace_steemit_links:
json_metadata["links"] = links
if add_canonical_url:
json_metadata["canonical_url"] = canonical_url + "/" + blog["category"] + "/@" + blog["author"] + "/" + blog["permlink"]
print("Edit post nr %d with canonical_url=%s" % (blog_count, json_metadata["canonical_url"]))
print("---")
if not test_run:
try:
blog.edit(body, meta=json_metadata, replace=True)
except:
print("Skipping %s due to error" % blog.authorperm)
time.sleep(6)
expected_count += 100
```
You can now start the script with:
```
python fix_canonical_urls_hive.py
```
If you are on Linux, you should replace `pip` by `pip3` and `python` by `python3`.
## How does it work
The script goes through all blog posts written before 2020-03-14. Whenever the post was written by an app, that is not properly handled by [hivescript](https://www.npmjs.com/package/@hivechain/hivescript), a new canonical_url is set.
You can define your preferred front-end here:
```
canonical_url = "https://hive.blog"
```
If you like other front-ends, you can replace this line by
* `canonical_url = "https://peakd.com"`
* `canonical_url = "https://leofinance.io"`
* `canonical_url = "https://esteem.app"`
In the next step, all used links are checked. Whenever a link is pointing to a valid hive post or to a valid hive user, the link is replaced by a releative url (When the link was pointing to steemit.com, steempeak.com, busy.org or partiko.app).
## Test run
You can do a test run and checking what will be changed by the script:
![test_run](https://images.hive.blog/DQmZht6KJmnJkBGebgwyXdGBpj4dSCCYkdqwzn6PpwD4ado/test_run)
This show now the following information:
![test result](https://images.hive.blog/DQmWE3JdvZBYJauzmAHjFFnsCwyuZ6YBHzonbbSbtPf5Gdx/test%20result)
The set canonical url is shown as well all links that will be replaced.
## Fixing your posts
We can now start to fix all old posts:
![starting the script](https://images.hive.blog/DQmdaFJFkxsAaMmc8wG5vESFwRGqPJwRR2GPXySJtvEAxKh/starting%20the%20script)
## Results
All changes have been broadcasted:
![Broadcasted posts](https://images.hive.blog/DQmNq6JKN9Cr5mDkBMsEcL8nCqRvH7ktkprTNXTU75K4Pz4/Broadcasted%20posts)
The links have been corrected, as shown here:
![](https://images.hive.blog/DQmUFVmj9vW84ZH1aa2V4PfSKHsJLsARDUkWbo62KpxnnfA/image)
There seems to be a bug with hive.blog, that steemit.com links are shown as internal and hive.blog links are shown as external links.
The canonical url is also fixed:
![canonical urls](https://images.hive.blog/DQmVjxuvCHYtfaS2p6c9i4wj43siJUMJciB2gwKE8dn5RMs/canonical%20urls)
It seems that esteem.app has not changed its canonical url right now. As I know that esteem.app should read the `canonical_url` parameter (works for steempress), it may correct the canonical URLs later.
After a fix on esteem.app, esteem.app is using now the correct canonical url:
![canonical url from esteem.app](https://images.hive.blog/DQmcbaPV3teUczT5WmxuaMybeyz4hdTEbaG5NV6kQ2HGHhj/canonical%20url%20from%20esteem.app)
## Results on STEEM
Setting `canonical_url` works also on steemit:
![canonical url on steemit](https://images.hive.blog/DQmShxP35w4ARXfryjEdsdWnuVKV1zebkSoRgbGCjTQRQbC/canonical%20url%20on%20steemit)
I used [seoreviewtools](https://www.seoreviewtools.com/canonical-url-location-checker/) to check the canonical urls.
___
*If you like what I do, consider casting a vote for me as witness on [Hivesigner](https://hivesigner.com/sign/account-witness-vote?witness=holger80&approve=1) or on [PeakD](https://peakd.com/witnesses)*
See: How to fix canonical URLs and links in your pre-fork posts by @holger80