用Python爬虫收集代理信息

销售常常遇到的问题是将一家公司的渠道信息收集起来。比如一个公司的全球渠道有50家,若是将公司名、地址与官网都复制到一个Excel表格,那我需要复制200次,粘贴200次。显然人不喜欢重复而琐碎的工作。这样的工作更适合计算机。

用Python爬虫收集代理信息

假如我需要将以下这个网址的所有渠道信息都整理到一个表格,在计算机上如何实现呢?

https://www.biologic.net/sales-network/

采用当下一个主流的程序言语Python,并且用到requestsbeautiful soup两个模块,就可以解决这个问题。

程序步骤

1、Make a request

Begin by importing the Requests module:

import requests

Now, let’s try to get a webpage.

r = requests.get('https://www.biologic.net/sales-network/')

Now, we have a Response object called r. We can get all the information we need from this object.

2、Make a soup

Beautiful Soup is a Python library for pulling data out of HTML and XML files.To parse a document, pass it into the BeautifulSoup constructor.

from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, 'html.parser')

3、Locate the elements

Locate the elements which contains the information like company name and website. Then collect the information in the list.

expert_name_list = [x.text for x in soup.select("li > span:nth-child(1)")]
expert_position_list = [x.text for x in soup.select("li > span:nth-child(2)")]
expert_organization_list = [x.text for x in soup.select("li > span:nth-child(3)")]
expert_email_list = [x.text for x in soup.select("li > span:nth-child(4)")]

最终的代码

几行代码便可以实现想要的功能。

import requests
from bs4 import BeautifulSoup
import pandas as pd


r = requests.get('http://www.cmba.org.cn/fzjg/wylist.aspx-nodeid=144&userid=52.htm')
r.encoding = 'utf-8'
soup = BeautifulSoup(r.text, 'html.parser')

expert_name_list = [x.text for x in soup.select("li > span:nth-child(1)")]
expert_position_list = [x.text for x in soup.select("li > span:nth-child(2)")]
expert_organization_list = [x.text for x in soup.select("li > span:nth-child(3)")]
expert_email_list = [x.text for x in soup.select("li > span:nth-child(4)")]

data = {'专家名字': expert_name_list,
        '职务': expert_position_list,
        '组织': expert_organization_list,
        '邮件': expert_email_list} 
frame = pd.DataFrame(data)

frame['专家名字'].str.strip()
frame['职务'].str.strip()
frame['组织'].str.strip()
frame['邮件'].str.strip()



Read more

注意力商人

注意力商人

《注意力商人》The Attention Merchants这本书很重要的概念就是把注意力当作是商品。注意力商人通过一些免费或费用极低的服务换取人们的注意力,然后将注意力转卖给广告商,从而赚取中间的差价。例如抖音,用户可以免费看视频,同时也贡献了自己的注意力。抖音可以将换取的注意力转卖给广告商,从而实现盈利。作者用“收割”一词来形容换取注意力的过程,表示注意力就像麦子与稻谷一样的商品。“收割”一词有警醒的作用,提醒人们要保护好自己的注意力,以免被收割了。

By 谢现实