July 16th, 2024

10% of the Top Million Sites Are Dead

Craig Campbell's research reveals 10% of the top million sites are dead. Data issues in the Majestic Million dataset prompt caution. 10.7% of domains are unreachable, casting doubt on dataset reliability. Campbell suggests exploring alternative domain lists.

Read original articleLink Icon
10% of the Top Million Sites Are Dead

10% of the top million sites are dead according to Craig Campbell's research. He analyzed the Majestic Million dataset, which ranks websites based on the number of links pointing to them. Campbell found data issues in the dataset and highlighted the importance of verifying information before use. He also discussed challenges with domain normalization, where domains with and without the www prefix were not consistently handled. Campbell then conducted a check to verify the responsiveness of the top sites to HTTP requests. The results showed that 10.7% of the domains were unreachable, raising concerns about the quality of the list. Despite potential reasons for the connectivity issues, Campbell expressed doubts about the dataset's reliability. He suggested further investigation into alternative top domain lists for comparison. Campbell shared the CSV file containing the HTTP response codes for those interested in exploring the data further.

Link Icon 2 comments
By @zorrn - 6 months