1. 您的位置:首页 > seo技术 >内容

SEO的8个Python库和如何使用它们

Python库是一种有趣和可访问的方式,可以使用学习和使用Python进行SEO。

Python库是一个有用的函数和代码的**,允许您完成许多任务而无需从头开始编写代码。

有超过100,000个用于Python的库,可用于从数据分析到创建视频游戏的功能。

在本文中,您将找到我用于完成SEO项目和任务的几个不同库。所有这些都是初级友好的,您可以找到充足的文档和资源来帮助您开始。

为什么Python库对SEO有用?

每个Python库都包含可用于执行不同任务的所有类型(数组,词典,对象等)的函数和变量。

例如,对于SEO,它们可用于自动化某些事物,预测结果,并提供智能洞察力。

可以使用仅适用于Vanilla Python,但可以使用图书馆o使任务更容易,更快地编写和完成。

SEO任务的Python库

有许多有用的Python库对于SEO任务,包括数据分析,Web刮擦和可视化洞察。

这不是详尽的清单,但这些是我发现自己最适合SEO目的的图书馆。

熊猫

Pandas是一个用于与表数据一起使用的Python库。它允许高级数据操作,其中密钥数据结构是DataFrame。

DataFrame类似于Excel电子表格,但是,它们不仅限于行和字节限制,并且也更快,更高效。

熊猫开始的最佳方法是采取简单的数据(例如,网站的爬网),并将其保存为Python作为DataFrame。

一旦将其存储在Python中,您就可以执行多个不同的分析任务,包括聚合,枢转和清洁数据。

例如,如果我有一个完整的网站爬网并想要仅提取those pages that are indexable, I will use a built-in Pandas function to include only those URLs in my DataFrame.

import pandas as pd df = pd.read_csv('/Users/rutheverett/Documents/Folder/file_name.csv')df.headindexable = df[(df.indexable == True)]indexable

Requests

The next library is called Requests and is used to make HTTP requests in Python.

Requests uses different request methods such as GET and POST to make a request, with the results being stored in Python.

One example of this in action is a simple GET request of URL, this will print out the status code of a page:

import requestsresponse = requests.get('https://www.deepcrawl.com') print(response)

You can then use this result to create a decision-making function, where a 200 status code means the page is available but a 404 means the page is not found.

if response.status_code == 200: print('Success!')elif response.status_code == 404: print('Not Found.')

You can also use different requests such as headers, which display useful information about the page like the content type or how long it took to cache the response.

headers = response.headersprint(headers)response.headers['Content-Type']

There is also the ability to simulate a specific user agent, such as 站群bot, in order to extract the response this specific bot will see when crawling the page.

headers = {'User-Agent': 'Mozilla/5.0 (compatible; 站群bot/2.1; +http://www.google.com/bot.html)'} ua_response = requests.get('https://www.deepcrawl.com/', headers=headers) print(ua_response)

Beautiful Soup

Beautiful Soup is a library used to extract data from HTML and XML files.

Fun fact: The BeautifulSoup library was actually named after the poem from Alice’s Adventures in Wonderland by Lewis Carroll.

As a library, BeautifulSoup is used to make sense of web files and is most often used for web scraping, as it can transform an HTML document into different Python objects.

For example, you can take a URL and use Beautiful Soup together with the Requests library to extract the title of the page.

from bs4 import BeautifulSoup import requestsurl = 'https://www.deepcrawl.com' req = requests.get(url) soup = BeautifulSoup(req.text, "html.parser")title = soup.title print(title)

Additionally, using the find_all method, BeautifulSoup enables you to extract certain elements from a page, such as all a href links on the page:

url = 'https://www.deepcrawl.com/knowledge/technical-seo-library/' req = requests.get(url) soup = BeautifulSoup(req.text, "html.parser")for link in soup.find_all('a'): print(link.get('href'))

Putting Them Together

These three libraries can also be used together, with Requests used to make the HTTP request to the page we would like to use BeautifulSoup to extract information from.

We can then transform that raw data into a Pandas DataFrame to perform further ****ysis.

URL = 'https://www.deepcrawl.com/blog/'req = requests.get(url)soup = BeautifulSoup(req.text, "html.parser")links = soup.find_all('a')df = pd.DataFrame({'links':links})df

Matplotlib and Seaborn

Matplotlib and Seaborn are two Python libraries used for creating visualizations.

Matplotlib allows you to create a number of different data visualizations such as bar charts, line graphs, histograms, and even heatmaps.

For example, if I wanted to take some 站群 Trends data to display the queries with the most popularity over a period of 30 days, I could create a bar chart in Matplotlib to visualize all of these.

Seaborn, which is built upon Matplotlib, provides even more visualization patterns such as scatterplots, box plots, and violin plots in addition to line and bar graphs.

It differs slightly from Matplotlib as it uses fewer syntax and has built-in default themes.

One way I’ve used Seaborn is to create line graphs in order to visualize log file hits to certain segments of a website over time.

sns.lineplot(x = "month", y = "log_requests_total", hue='category', data=pivot_status)plt.show()

This particular example takes data from a pivot table, which I was able to create in Python using the Pandas library, and is another way these libraries work together to create an easy-to-understand picture from the data.

Advertools

Advertools is a library created by Elias Dabbas that can be used to help manage, understand, and make decisions based on the data we have as SEO professionals and digital marketers.

Sitemap Analysis

This library allows you to perform a number of different tasks such as downloading, parsing, and ****yzing XML Sitemaps to extract patterns or ****yze how often content is added or changed.

Robots.txt Analysis

Another interesting thing you can do with this library is to use a function to extract a website’s robots.txt into a DataFrame, in order to easily understand and ****yze the rules set.

You can also run a test within the library in order to check whether a particULAR用户代理能够获取某些URL或文件夹路径。

URL分析

Bidderools还使您能够解析和分析URL,以便提取信息,更好地了解某些URL集的分析,SERP和爬网数据。

您还可以使用库拆分URL来确定正在使用的HTTP方案,主路径,附加参数和查询字符串等内容。

Selenium是一个蟒蛇库,通常用于自动化目的。最常见的用例是测试Web应用程序。

Selenium自动化流程的一个流行示例是打开浏览器的脚本,并以定义的序列执行许多不同的步骤,例如填写表单或单击某些按钮。

硒采用与我们之前所涉及的请求库中使用的原则相同的原则。

但是,它不仅会发送请求并等待响应,还不会呈现正在请求的网页。

要开始硒开始,你需要一个WebDriver旨在与浏览器进行交互。

每个浏览器都有自己的webdriver; Chrome有染色体,例如Firefox有Geckodriver。

这些易于使用您的Python代码下载和设置。以下是解释设置过程的有用文章,其中包含示例项目。

Scrapy.

我想在本文中涵盖的最终图书馆是Scrapy。

虽然我们可以使用请求模块来爬网并从网页中提取内部数据,以便通过该数据并提取有用的洞察力,我们还需要将其与BreauteSoup结合起来。

Scrapy基本上允许您在一个图书馆中做这两种。

Scrapy也很快更快,更强大,完成请求在**序列中爬网,提取和解析数据,并允许您屏蔽数据。

在Scrapy中,您可以定义许多指令,例如您想要抓取的域名,启动URL和某些页面文件夹允许或不允许爬网。

Scrapy C.an be used to extract all of the links on a certain page and store them in an output file, for example.

class SuperSpider(CrawlSpider): name = 'extractor' allowed_domains = ['www.deepcrawl.com'] start_urls = ['https://www.deepcrawl.com/knowledge/technical-seo-library/'] base_url = 'https://www.deepcrawl.com' def parse(self, response): for link in response.xpath('//div/p/a'): yield { "link": self.base_url + link.xpath('.//@href').get() }

You can take this one step further and follow the links found on a webpage to extract information from all the pages which are being linked to from the start URL, kind of like a **all-scale replication of 站群 finding and following links on a page.

from scrapy.spiders import CrawlSpider, Rule class SuperSpider(CrawlSpider): name = 'follower' allowed_domains = ['en.*********.org'] start_urls = ['https://enseo是什么意思.*********.org/wiki/Web_scraping'] base_url = 'https://en.*********.org' custom_settings = { 'DEPTH_LIMIT': 1 } def parse(self, response): for next_page in response.xpath('.//div/p/a'): yield response.follow(next_page, self.parse) for quote in response.xpath('.//h1/text()'): yield {'quote': quote.extract() }

Learn more about these projects, among other example projects, here.

Final Thoughts

As Hamlet Batista always said, “the best way to learn is by doing.”

I hope that discovering some of the libraries available has inspired you to get started with learning Python, or to deepen your knowledge.

Python Contributions from the SEO Industry

Hamlet also loved sharing resources and projects from those in the Python SEO community. To honor his passion for encouraging others, I wanted to share some of the amazing things I have seen from the community.

As a wonderful tribute to Hamlet and the SEO Python community帮助培养,Charly Wargnier创造了SEO Pythonistas,为SEO社区创造了惊人的Python项目的贡献。

哈姆雷特对SEO社区的无价贡献得到特色。

Moshe Ma-Yafit为日志文件分析创建了一个超酷的脚本,在此帖子中介绍了脚本的工作原理。它能够通过设备显示(响应代码)每日命中,响应代码%总数等,可以显示包括站群机器人的可视化。

KorayTużberkGübür目前正在研究SiteMap Health Checker。他还举办了一个Ranksense网络研讨会,elias dabbas,他分享了一个唱片,它记录SERPS和分析算法。

它基本上记录了常规时间差异的SERP,您可以抓取所有登陆页面,混合数据并创建一些相关性。

John Mcalpin写了一篇文章,详细介绍了如何使用Python和Data Studio在竞争对手上间谍的文章。

JC Chouinard写了一个完整的指南来使用reddit api。有了这个,哟u can perform things such as extracting data from Reddit and posting to a Subreddit.

Rob May is working on a new GSC ****ysis tool and building a few new domain/real sites in Wix to measure against its higher-end WordPress compe***** while documenting it.

Masaki Okazawa also shared a script that ****yzes 站群 Search Console Data with Python.

More Resources:

How to Automate the URL Inspection Tool with Python & JavaScript

6 SEO Tasks to Automate with Python

Advanced Technical SEO: A Complete Guide

Image Credits

All screenshots taken by author, March 2021

本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。如若转载,请注明出处:http://www.botadmin.cn/sylc/1155.html