Python 爬坑记录

《一》flask请求卡死解决

实际运用flask中，遇到需要调用系统命令的场景，开发版本的flask服务器很容易出现线程卡主的情况，
翻墙爬帖，通过改flask设置解决：

1	$ app.run(host='0.0.0.0',debug = True,threaded=True)

《二》python跨层级引用模块

如下图的目录结构，

|———目录1
|——————目录2
|——————————test1.py
|——————目录3
|——————————test2.py

我在test2.py 中需要导入test1.py的方法使用，有多种解决方法：
1.改环境变量，设置PYTHONPATH—-(失败)
2.改系统python的目录读取配置—-(未尝试)
2.改自己的代码—-(成功)

#test2.py
import sys
sys.path.append("..")
import test1

《三》每日定时任务

#-*-coding:utf-8-*-
import datetime
import time

def doSth():
    print('test')
    time.sleep(60)

def main(h, m):
    '''h表示设定的小时，m为设定的分钟'''
    while True:
        while True:
            now = datetime.datetime.now()
            if now.hour==h and now.minute==m:
                break
            time.sleep(20)
        doSth()
#每天2点执行
main(2,0)

《四》字符串内容截取

整理下常用的字符串操作。

1.获取网页指定标签内容

from bs4 import BeautifulSoup

file = open('app/static/uploads/test1.html').read()
soup = BeautifulSoup(file)
allData = []
for k in soup.find_all('a'):
	value = k.text
	allData.append([count, value])

2.获取字符串指定字符前的字符

1
2
3

url = "http://192.168.43.131:82/openapi_typ/getAllNewCount?appType=0"
path = url[1:url.rfind('?', 1)]
#http://192.168.43.131:82/openapi_typ/getAllNewCount

3.获取指定位置后的字符

1
2
3

url = "http://192.168.43.131:82/openapi_typ/getAllNewCount?appType=0"
path = url[25:len(url)]
#openapi_typ/getAllNewCount?appType=0

4.获取字符中的动态字符，并转译成utf-8

1 2	p = re.findall(r'"question":{"text":"(.+?)","url"',line) print(p[0].decode('gbk').encode('utf-8'))