Cause of the Matter#
To-do item - Building a local IP proxy pool on the computer
The ultimate goal: to design a GUI
client that can easily crawl proxies, use proxies randomly, switch proxies, clear proxies, etc., just click to change the ip
.
Material acquisition: Check the article "Multithreaded Crawling of Xici High-Anonymity Proxies and Verification of Availability" in the Python Directory.
Rewrite the Python Code for Crawling Xici Proxies#
I found it hard to accept the code I wrote three months ago, so I summarized it as follows (a programmer who cannot summarize is not a good security novice):
Problem | Improvement Method |
---|---|
Used an inappropriate process library Pool().apply_async, which caused the stored list to not be retained during multiprocessing, so I adopted file read and write methods, making the code very messy | Re-understood the applicability of processes and threads and adopted a multithreaded queue |
A large number of I/O operations caused compatibility issues in different environments | Used MongoDB database for storage |
Low code reusability | Separated crawling operations and database insertion operations in classes, and made the database insertion function reusable, making the code much cleaner |
Did not handle code exceptions | Added code exception handling try except in places prone to errors |
Reference tutorials:
- Detailed Explanation of Python Multiprocessing and Multithreading
- Python pymongodb | Beginner's Tutorial
Full code:
#!/usr/bin/python
# -*- coding: utf-8 -*-
'''
@author: soapffz
@function: Multithreaded crawling of Xici high-anonymity proxies and storing them in MongoDB
@time: 2019-04-20
'''
import pymongo # MongoDB database operations
from threading import Thread # Multithreading
from fake_useragent import UserAgent # Fake user agent
import requests # Request site
from lxml import etree # Parse site
import re # re parsing library
import telnetlib # Telnet connection test for proxy validity
import timeit # Calculate time taken
class multi_threaded_crawl_proxies(object):
def __init__(self):
try:
# Connect to MongoDB and get the connection object
client = pymongo.MongoClient('mongodb://localhost:27017/')
self.db = client.proxies # Specify the proxies database, this class uses this database
print("Database connection successful!")
except Exception as e:
print("Database connection failed: {}".format(e))
self.ua = UserAgent() # Used to generate User-Agent
self.crawl_progress()
def crawl_progress(self):
# Crawling operation
try:
# Multiple crawling function startup functions can be added here
self.xici_nn_proxy_start()
except Exception as e:
print("Program runtime error: {}\nProgram exit!".format(e))
exit(0)
def xici_nn_proxy_start(self):
xici_t_cw_l = [] # Crawling thread list
self.xici_proxies_l = [] # Used to store verified proxy dictionaries, the dictionary includes ip, port, ip type, address
for i in range(1, 21): # Crawl 20 pages of proxies
t = Thread(target=self.xici_nn_proxy, args=(i,))
xici_t_cw_l.append(t) # Add thread to thread list
t.start()
for t in xici_t_cw_l: # Wait for all threads to complete before exiting the main thread
t.join()
self.db_insert(self.xici_proxies_l, "xici") # Insert into database
def xici_nn_proxy(self, page):
# Xici proxy crawling function
url = "https://www.xicidaili.com/nn/{}".format(page)
# Not adding user-agent here will return status code 503
req = requests.get(url, headers={"User-Agent": self.ua.random})
if req.status_code != 200:
print("IP is blocked! This page crawl failed!")
exit(0)
else:
print("Crawling content of page {}...".format(page))
content = req.content.decode("utf-8")
tree = etree.HTML(content)
# Use xpath to get the total ip_list
tr_nodes = tree.xpath('.//table[@id="ip_list"]/tr')[1:]
for tr_node in tr_nodes:
td_nodes = tr_node.xpath('./td') # Use xpath to get the label of a single ip
speed = int(re.split(r":|%", td_nodes[6].xpath(
'./div/div/@style')[0])[1]) # Get the speed value
conn_time = int(re.split(r":|%", td_nodes[7].xpath(
'./div/div/@style')[0])[1]) # Get the connection time value
if(speed <= 85 | conn_time <= 85): # If speed and connection time are not ideal, skip this proxy
continue
ip = td_nodes[1].text
port = td_nodes[2].text
ip_type = td_nodes[5].text.lower()
td_address = td_nodes[3].xpath("a/text()")
address = 'None'
if td_address: # Some addresses are empty, default to empty, if obtained, set to corresponding address
address = td_address[0]
proxy = "{}:{}".format(ip, port)
try:
# Connect via telnet, if it connects, the proxy is available
telnetlib.Telnet(ip, port, timeout=1)
except:
pass
else:
self.xici_proxies_l.append(
{"ip": ip, "port": port, "ip_type": ip_type, "address": address})
def db_insert(self, proxies_list, collection_name):
if proxies_list:
# Exit if the passed list is empty
collection = self.db['{}'.format(collection_name)] # Select collection, will automatically create if it does not exist
collection.insert_many(proxies_list) # Insert elements as a list of dictionaries
else:
print("Proxy list is empty!\nProgram exit!")
exit(0)
if __name__ == "__main__":
start_time = timeit.default_timer()
multi_threaded_crawl_proxies()
end_time = timeit.default_timer()
print("Program finished running, total time: {}".format(end_time-start_time))
The effect is as follows:
[19-04-26 Update]
Modify Registry Parameters#
- Reference Article 1: Ultimate Solution to Automatically Switch Proxy IP on Windows Using Python!
- Reference Article 2: Conversion Between Strings, Hexadecimal Strings, Numbers, and Bytes in Python3
Now that we have the proxy, we need to modify the local proxy in Windows. After some research, I learned that:
win
can use the proxy settings of theIE
browser to access the external network- Most articles online introduce the IE proxy location as
HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings
- However, according to the author of Reference Article 1, the actual place to modify should be
HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings\Connections
That is, the Connections
item in the above location, we set a proxy of 127.0.0.1:80
in the IE proxy:
Opening the proxy looks like this:
The registry export looks like this:
Windows Registry Editor Version 5.00
[HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings\Connections]
"DefaultConnectionSettings"=hex:46,00,00,00,0a,00,00,00,01,00,00,00,00,00,00,\
00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,\
00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00
"SavedLegacySettings"=hex:46,00,00,00,51,00,00,00,0b,00,00,00,0c,00,00,00,31,\
32,37,2e,30,2e,30,2e,31,3a,38,30,07,00,00,00,3c,6c,6f,63,61,6c,3e,00,00,00,\
00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,\
00,00,00,00,00,00,00,00
"Netkeeper"=hex:46,00,00,00,31,2c,30,30,0b,00,00,00,0c,00,00,00,31,32,37,2e,30,\
2e,30,2e,31,3a,38,30,07,00,00,00,3c,6c,6f,63,61,6c,3e,00,00,00,00,00,00,00,\
00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,\
00,00,00,00
"greenvpn"=hex:46,00,00,00,02,00,00,00,01,00,00,00,00,00,00,00,00,00,00,00,00,\
00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,\
00,00,00,00,00,00,00,00,00,00
In addition to the IP address and port, there are also some strange hexadecimal strings. According to the introduction of Reference Article 1, the parameters are roughly as follows: (Image from Reference Article 1)
Thanks again to the author SolomonXie
for Reference Article 1:
To summarize, there are only a few parameters:
46 00 00 00 Switch 00 00 00 IP Length 00 00 00 IP Address 00 00 00 Whether to skip local proxy 21 00 00 00 PAC Address
I don't know which version of IE
the author used; mine is IE
11, which is slightly different, as follows:
46 00 00 00 Incremental Position 00 00 00 Switch 00 00 00 IP Length 00 00 00 IP Address 00 00 00 Whether to skip local proxy
-
Each piece of information is separated by three
00
, that is00 00 00
. -
Switch: Mainly represents the checkbox selection status in the IE settings. Using a proxy is
03
, not using it is01
, whether to skip the local proxy is independent of this switch, it only depends on the last whether to skip the local proxy part. You can also set your settings and then open the registry to check. -
Incremental Position: I don't know from which value it starts to increment. Even if the settings are unchanged, the incremental value will start to increase when you click OK again. It's best to set it to
00
, and later effects have confirmed that setting this incremental position to00
has no impact. -
IP Length: Hexadecimal, including
.
and:
, for example, I set127.0.0.1:80
, which is 12 characters long, and the value in the registry is0C
. -
IP Address: Just convert the IP to hexadecimal according to each character.
-
Whether to skip local proxy: If not checked, it is all
0
, if checked, the value is:
07 00 00 00 3c 6c 6f 63 61 6c 3e
Note here, if you see this and have started coding, be aware that the hexadecimal values imported into the registry do not have spaces or commas.
-
This segment, except for the preceding
07
, means:<local>
, which is a fixed value and does not need to be modified. -
Finally, fill with
0
to ensure that the total length of the items with the IP address is224
, and the total length of those without is167
.
So based on the above content, according to my situation, the registry value for setting the proxy to not proxy locally should be as follows:
46 00 00 00 00 00 00 00 03 00 00 00 IP Length 00 00 00 IP Address 00 00 00 07 00 00 00 3c 6c 6f 63 61 6c 3e
Just convert the IP address to a hexadecimal string and calculate the length to pass in.
Registry modification code:
Note that the hexadecimal values imported into the registry do not have spaces or commas.
#!/usr/bin/python
# -*- coding: utf-8 -*-
'''
@author: soapffz
@function: Modify the registry to set the system proxy
@time: 19-04-26
'''
import IPy # Check if the IP is valid
import re
import subprocess # Execute cmd commands
def setproxy(hex_value):
# Pass in the organized hexadecimal proxy settings
try:
vpn_name = "Netkeeper" # Your dedicated network name
subprocess.run('REG ADD "HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings\Connections" /v "{}" /t REG_BINARY /d "{}" /f'.format(
vpn_name, hex_value), shell=True, stdout=subprocess.PIPE)
print("Registry import successful!")
except Exception as e:
print("Error modifying registry: {}\nProgram exit!".format(e))
exit(0)
def registry_value_construction(ip, port, proxy_switch, local_proxy_switch):
# Registry value construction
switch_options = {"1": "03", "0": "01"} # Proxy switch options
local_switch_options = {
"1": "070000003c6c6f63616c3e", "0": ""} # Local proxy switch options
# Regular expression for port validity check
port_regular_expression = r'^([0-9]|[1-9]\d|[1-9]\d{2}|[1-9]\d{3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5])$'
if not re.search(port_regular_expression, port): # If the port is not between 0-65535, report an error
print("Port does not meet the type\nProgram exit!")
exit(0)
if not ip: # If the IP is empty, ignore the port setting and set all to 0
former = "4600000000000000{}00000000000000{}".format(switch_options.get(
proxy_switch), local_switch_options.get(local_proxy_switch)) # Fill in proxy switch options and local proxy switch options
# Fill in a part of 00, otherwise it cannot be imported, same below
value = former + "00"*int((112-int(len(former)))/2)
print("Registry parameter value construction completed with empty IP!")
else:
try:
IPy.IP(ip) # The registry can fill in any IP and port, but we do not allow it. If the IP is incorrect or empty, it will directly report an error and exit.
# The hex() method converts the number starting with 0x and the IP length occupies two digits
ip_len = hex(len(ip)+len(port)+1).split("x")[-1].zfill(2)
ip = bytes(ip, 'ascii').hex() # Get the hexadecimal string of the IP
port = bytes(port, 'ascii').hex()
former = "4600000000000000{}000000{}000000{}3A{}{}".format(switch_options.get(
proxy_switch), ip_len, ip, port, local_switch_options.get(local_proxy_switch)).lower()
value = former+"00"*int((150-int(len(former)))/2)
print("Registry parameter value construction completed!")
except Exception as e:
print("Program error: {}\nProgram exit!".format(e))
exit(0)
return value
if __name__ == "__main__":
ip = "127.0.0.1" # Set proxy IP
port = "80" # Set proxy port
proxy_switch = "1" # Set proxy switch, "1" to enable proxy, "0" to disable proxy
local_proxy_switch = "1" # Set local proxy switch, "1" to enable, does not apply proxy to local, "0" to disable, applies proxy to local
hex_value = registry_value_construction(
ip, port, proxy_switch, local_proxy_switch) # This function returns the hexadecimal string of the proxy setting parameters
setproxy(hex_value) # Pass the hexadecimal string to the function to set
[19-05-03: Modified the ip_len statement]
Partial function explanation:
- Since the IP legality detection function called by IPy will report an error when the IP is empty, first check if the IP is empty; if not, ignore the port value, setting everything to 0 except for the switch.
- The regular expression used to check the port validity detects
0-65535
. As a memo, the regular expression for1024-65535
is as follows:
^(1(02[4-9]|0[3-9][0-9]|[1-9][0-9]{2})|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5])$
The demonstration of the effect is as follows (click to enlarge on the computer):
[19-05-03 Completed]
After writing a deep learning project for a few days, my thoughts are a bit chaotic. Today, I will take a break and organize the writing ideas for the previously completed Windows local proxy client.
UI Design#
Background: Previously, I have completed the rewriting of the multithreaded crawling of Xici proxies code and the modifying the system proxy through the registry parts.
Using the command line all the time is not pleasant. Implementing this functionality with a GUI would be very suitable. After various comparisons, I chose pyqt5
(not because it has a visual design interface that is more convenient).
Installation#
My environment: Win10+Anaconda3+pip/conda Tsinghua source
It seems that the spyder
library requires support from the pyqt5
library. Directly installing pyqt5
will prompt that the version you are updating to is too high, which makes spyder
unhappy. So:
pip install spyder
pip install pyqt5
Basic Page Introduction#
After installing pyqt5
, if you did not check to add it to the environment variable when installing Anaconda3
, then add the environment variable according to your own python
package installation location.
If Anaconda
has added the environment variable by default, just enter designer
in the command line to open it. The interface and basic introduction are as follows (click the image to enlarge on the computer):
The UI is entirely designed by yourself. It is recommended to use Containers
widgets in MainWindow
to separate your functional areas with Frame
widgets, which will help you handle each functional area later.
For a quick start, you can refer to: PyQT5 Quick Start Tutorial - 2 Qt Designer Introduction and Getting Started.
The article by 翻滚吧挨踢男
is still very easy to understand: Full Directory of Python Beginner's Tutorial.
Signal & Slot#
PyQt5 has a unique signal & slot mechanism to handle events. Signals and slots are used for communication between objects. A signal is triggered when a specific event occurs, and a slot can be any callable object. When a signal is triggered, the connected slot is called.
This means you need to design who is the sender and who is the receiver. Here is an example:
** In designer
, press F3
and F4
to switch between the UI design interface and the signal and slot editing interface **
Drag the signal sender to the signal receiver, and the signal and slot editing interface will pop up. Here, we drag a combobox
to a textbrowser
:
** soapffz
's suggestion: In the early stages, click through the commonly used methods of each different component, export the Python
code to view the statement construction, and later only design the UI, with all signals and slots written in the logic part of the code, achieving separation of UI and logic.**
Reference article: Still the article by 翻滚吧挨踢男
: PyQt5 Learning Notes 05----Qt Designer Signal Slot.
Code Export#
Now that the basic UI design is complete, let's add a signal slot to the dropdown for proxy crawling
as an example to illustrate the structure of the exported code, as shown in the figure:
You can also see at the bottom right of the interface:
After installing pyqt5
in my environment, you can execute the command pyuic5 -o xx.py xx.ui
in cmd
.
If you find that pyuic5
cannot be executed, search for it online. The demonstration is as follows:
The structure of the exported code is as follows:
# -*- coding: utf-8 -*-
# Form implementation generated from reading ui file 'local_prixies.ui'
#
# Created by: PyQt5 UI code generator 5.11.3
#
# WARNING! All changes made in this file will be lost!
from PyQt5 import QtCore, QtGui, QtWidgets
class Ui_MainWindow(object):
def setupUi(self, MainWindow):
MainWindow.setObjectName("MainWindow")
MainWindow.resize(640, 480)
MainWindow.setMinimumSize(QtCore.QSize(640, 480))
MainWindow.setMaximumSize(QtCore.QSize(640, 480))
Various component sizes, positions, widths, heights, and other properties
self.retranslateUi(MainWindow)
The added signals and slots are here
self.comboBox_proxychoose_crawl.currentTextChanged['QString'].connect(self.textBrowser_disp.append)
QtCore.QMetaObject.connectSlotsByName(MainWindow)
def retranslateUi(self, MainWindow):
_translate = QtCore.QCoreApplication.translate
MainWindow.setWindowTitle(_translate("MainWindow", "Windows Local Proxy Settings - by soapffz"))
self.label_funcchoose.setText(_translate("MainWindow", "Function Selection"))
Various contents within components
The exported code only has one class, and it must be instantiated to start. Here, we introduce an important concept: Separation of UI and Code Logic.
Separation of UI and Code Logic#
- Non-separation state
Add the following code to the exported py file (ensure your class name is also UI_MainWindow):
if __name__ == "__main__":
import sys
app = QtWidgets.QApplication(sys.argv)
MainWindow = QtWidgets.QMainWindow()
MyWindow = Ui_MainWindow()
MyWindow.setupUi(MainWindow)
MainWindow.show()
sys.exit(app.exec_())
This will allow you to start the interface in the exported py
file (large GIF):
- Separation state
Create a new mainwindow.py
file, mainly to implement the instantiation of the interface and all logic implementations to achieve separation of UI and logic. Add the following code (assuming the name of your exported py
file is ui.py
):
from PyQt5 import QtCore, QtGui, QtWidgets
from ui import Ui_MainWindow
class MainWindow(QtWidgets.QMainWindow, Ui_MainWindow):
def __init__(self):
super().__init__()
# Instantiate the UI interface
self.setupUi(self)
# After instantiation, self has all the properties of UI_Mainwindow
# Edit all signal slots and remaining UI parts here to achieve separation of UI and code logic
Create a new main.py
file, mainly for startup, and add the following code:
from PyQt5 import QtCore, QtGui, QtWidgets
from mainwindow import MainWindow
from sys import argv
from sys import exit as sys_exit
if __name__ == "__main__":
app = QtWidgets.QApplication(argv) # Get command line parameters
mainWindow = MainWindow() # Create interface instance
mainWindow.show() # Show interface
sys_exit(app.exec_())
Then execute main.py
to see the interface. A small suggestion:
- After becoming familiar, only write
UI
indesigner
, without writing any signals and slots. All will be added inmainwindow.py
in the commented area. - The parameters needed for initializing the interface should also be placed in
mainwindow.py
. Here is an example:
In the dropdown for proxy crawling, we need to add the two dropdown options Xici High-Anonymity
and Other Not Available
. We can add the proxy name list during initialization in mainwindow.py
, as follows:
mainwindow.py
from PyQt5 import QtCore, QtGui, QtWidgets
from ui import Ui_MainWindow
class MainWindow(QtWidgets.QMainWindow, Ui_MainWindow):
# To separate UI and logic, we complete the basic logic operations in this class
def __init__(self):
# The initialization part is placed directly in this function
super().__init__()
# Instantiate the UI interface
self.setupUi(self)
# Proxy names, if there are new proxy functions, add names here
self.proxyname_l = ["Xici High-Anonymity", "Other Not Available"]
self.ui_remaining_part()
def ui_remaining_part(self):
# In addition to the basic UI interface, set the initial values of each component and instantiate each object in this function
# Insert the proxy name list into the two proxy comboboxes
self.comboBox_proxychoose_crawl.addItems(self.proxyname_l)
self.comboBox_proxychoose_setup.addItems(self.proxyname_l)
Thus, the other parameters needed for initializing the UI
can be written in mainwindow.py
after instantiating setupUI
.
** This way, we only need to modify the important component aliases when designing the UI
, and we can happily edit the remaining properties and signals and slots of the corresponding components in mainwindow.py
. Even if the UI
interface changes slightly, it will not affect the original code logic, thus achieving separation of UI
and code logic.**
Reference article: Introduction to Separating UI and Logic in PyQt5.
Passing Extra Parameters in Signals and Slots#
The importance of passing extra parameters in signals and slots is no less than the separation of UI
and code logic mentioned above.
As we have already said, when designing the UI
, just focus on the design, and all signals and slots will be implemented in the logic part of the code. This requires us to be familiar with the events that each component can trigger and what slots are available. The format for passing signals and slots triggered by component events is generally as follows:
self.component.event.connect(slot)
For example, the code for outputting the content of the changed dropdown for proxy crawling
is as follows:
self.comboBox_proxychoose_crawl.currentTextChanged['QString'].connect(self.textBrowser_disp.append)
The .connect
in the slot can receive the signal emitted by the previous event. What is the received signal? Here are a few common examples:
- If it is
button.button clicked
, then the slot receives `` - If it is
dropdown.detect dropdown option change
, then the slot receives the value of the dropdown after the change. - If it is
checkbox.state changed
, then the slot receives the state value of the checkbox after the change, where 2 is selected, 0 is not selected, and 1 is half-selected.
These are the signals I found while writing code. If you use a component you have never used before and want to quickly know what signals are passed when a certain event of that component is triggered, you can do this:
class MainWindow(QtWidgets.QMainWindow, Ui_MainWindow):
# To separate UI and logic, we complete the basic logic operations in this class
def __init__(self):
super().__init__()
self.setupUi(self)
self.component.event.connect(self.custom_function)
def custom_function(self, parameter):
print(parameter)
This way, every time you trigger the corresponding event of the selected component, the passed parameters will be printed.
In other words, each component carries a certain parameter when triggering an event, but through experimentation, the subsequent connect can only be a function without parentheses.
So if we want to customize the parameters passed, how can we achieve that? We can use lambda
to pass parameters.
For example, I now have two buttons bt1
and bt2
. When I click one of them, I want to know which button I clicked. The code is as follows:
self.pushbutton_1.clicked.connect(lambda:self.whichbt(1))
self.pushbutton_2.clicked.connect(lambda:self.whichbt(2))
def whichbt(self, i):
print("I just printed button number {}".format(i))
This way, the function after .connect
can have parentheses and can pass multiple parameters in the parentheses, and you can also pass the component itself:
self.checkBox_1.stateChanged.connect(lambda: self.which_checkbox(self.checkBox_1, 1))
self.checkBox_2.stateChanged.connect(lambda: self.which_checkbox(self.checkBox_2, 2))
def which_checkbox(self, part, i):
part.setChecked(False)
print("The state of checkbox {} has changed, but I am now setting it to unchecked".format(i))
However, if we want to pass both the original parameters and extra parameters, I don't know how to achieve that. If anyone knows, please leave a message.
Reference article: Passing Extra Parameters in PyQt Signals and Slots.
After introducing the important parts, let's discuss some small problems I encountered while writing code and their solutions.
Line Input#
In addition to setting random proxies, I also added custom proxies, so when users input ip
and port
, I definitely need to perform the first step of validation. I found a high-quality article: Detailed Explanation of Basic Controls in PyQt5 - QLineEdit (Four).
From it, I learned how to set input restrictions for text boxes. The final code for the ip
and port
parts is as follows:
# Set the IP to be read-only by default, IP address mask; but here it only restricts the input type to numbers, and IP validity still needs to be verified
self.lineEdit_customizeip.setInputMask('000.000.000.000;_')
self.lineEdit_customizeip.setReadOnly(True)
# Set the port to be read-only by default, and restrict the port to 0-65535
# Set the allowed content in the text
port_reg = QtCore.QRegExp(r"^([0-9]|[1-9]\d|[1-9]\d{2}|[1-9]\d{3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5])$")
# Custom text validator
pportregValidator = QtGui.QRegExpValidator(self)
# Set properties
pportregValidator.setRegExp(port_reg)
# Set validator
self.lineEdit_customizeport.setValidator(pportregValidator)
self.lineEdit_customizeport.setReadOnly(True)
After setting the ip
, you can see the masked input box when you open it:
Because I cannot input anything other than what I restrict, it will not display in the text box, so I won't show the practical effect.
Mutual Exclusion of Two Checkboxes#
I used Check Box
to choose whether the proxy method is random or custom, and by default, they are not mutually exclusive.
Reference article: PyQt5 Series Tutorial (15): Radio Button.
Radio buttons are automatically exclusive by default. If the auto-exclusive feature is enabled, radio buttons belonging to the same parent widget will be part of the same exclusive button group. Of course, adding them to a QButtonGroup can achieve mutual exclusion for multiple groups of radio buttons.
To achieve mutual exclusion between the random proxy and custom proxy buttons, we put them into a mutually exclusive QButtonGroup, as follows:
# Put the two checkboxes into a mutually exclusive QButtonGroup to achieve a radio effect
self.btgp_mutuallyexclusive = QtWidgets.QButtonGroup(self.groupBox_setting)
self.btgp_mutuallyexclusive.addButton(self.checkBox_randomproxy)
self.btgp_mutuallyexclusive.addButton(self.checkBox_customizeproxy)
Let's take a look at the effect before adding them to the QButtonGroup
:
Now let's look at the effect after adding:
Binding Events When Closing the Window#
I hope to trigger the function to clear the proxy when exiting, so that even if the program closes, it does not affect our normal use. I found a reference article:
PyQt5 Programming (17): Window Events.
The formatting is messy, and the effect is that when you click the close button, a dialog box will pop up asking if you are sure you want to close.
The closeEvent(self, event) method is called when closing the window by clicking the close button in the title bar or calling the close() method. The event parameter can obtain an object of the QCloseEvent class. To prevent the window from closing, the ignore() method must be called through this object; otherwise, the accept() method should be called.
The following example shows that clicking the close button will display a standard dialog asking for confirmation to close the window. If the user clicks the "Yes" button, the window will close; if the user clicks the "No" button, only the dialog will close, and the window will not close.
The code is as follows:
import sys
from PyQt5 import QtWidgets
class MyWindow(QtWidgets.QWidget):
def init(self):
QtWidgets.QWidget.init(self)
self.resize(300, 100)
def closeEvent(self, e):
result = QtWidgets.QMessageBox.question(
self, "Close Window Confirmation", "Are you sure you want to close the window?", QtWidgets.QMessageBox.Yes | QtWidgets.QMessageBox.No, QtWidgets.QMessageBox.No)
if result == QtWidgets.QMessageBox.Yes:
e.accept()
QtWidgets.QWidget.closeEvent(self, e)
else:
e.ignore()
if __name__ == "__main__":
app = QtWidgets.QApplication(sys.argv)
window = MyWindow()
window.show()
sys.exit(app.exec_())
The achieved effect is as follows:
Now we can bind the function to clear the settings when clicking close:
def closeEvent(self, e):
# Clear settings when clicking the close button
QtWidgets.QWidget.closeEvent(self, self.pbp_of_clearsetup())
Functional Code Optimization#
Although I completed the functional code long ago, I found some imperfections in the original code and some functionalities that had not been implemented while writing the logic. Here, I will supplement the explanation.
Crawling Proxies and Database Operations#
When writing the proxy crawling, I only wrote the parts for crawling and storing in the database, but we also need the following functionalities:
When clicking to start the proxy, we need to read the ip
and port
from the database and store them in the corresponding proxy list used to store the read data. When clicking to change one, discard the current proxy and randomly select another from the list. When the proxy set is changed, clear the storage list, and when clicking to start the proxy again, read from the database and store it in the corresponding list, ensuring that the newly crawled proxies can be used.
I put the proxy crawling and reading from the database into one function to avoid repeating the code for database connections, as shown in the partial code:
try:
# Read ip and port information from the corresponding collection
collection_dict_l = self.db["{}".format(collection_name)].find({}, {
"_id": 0, "ip": 1, "port": 1})
for item in collection_dict_l:
# Deduplication operation
if item not in proxies_l:
# Add to the corresponding collection list, where the elements in the list are dictionaries of ip and port
proxies_l.append(item)
return proxies_l
except Exception as e:
return("Program error: {}".format(e))
Reference article: Python MongoDB Query Document.
Other reference articles:
- Python dictionary lookup by value reference article: Python Basics - A Few Points on Looking Up Key by Value in a Dictionary.
- Using winreg to get values from the registry function part reference article: Python Module _winreg Operating the Registry.
- PYQT5(3) The Problem of QProgressBar Freezing.
- Direct output list from pymongo's find reference article: Python Mongodb Find Directly Output List.
- When googling, I found a document from
51testing.com
, which seems to be quite comprehensive: Download Link. - A translated Pyqt5 tutorial by a Chinese person, although I haven't read it, just a memo: PyQt5-Chinese-tutorial.
Setting Proxy Part#
This really got me into trouble.
In the original code for constructing the registry function, after filtering out the empty IP and invalid port, the code for constructing the parameter value is as follows:
try:
IPy.IP(self.ip) # The registry can fill in any IP and port, but we do not allow it. If the IP is incorrect or empty, it will directly report an error and exit.
ip_len = hex(len(self.ip)+len(self.port)+1).replace('x', '')
ip = bytes(self.ip, 'ascii').hex() # Get the hexadecimal string of the IP
port = bytes(self.port, 'ascii').hex()
former = "4600000000000000{}000000{}000000{}3A{}{}".format(switch_options.get(
self.proxy_switch), ip_len, ip, port, local_switch_options.get(self.local_proxy_switch)).lower()
hex_value = former+"00"*int((150-int(len(former)))/2)
print("Registry parameter value construction completed!")
except Exception as e:
print("Program error: {}\nProgram exit!".format(e))
exit(0)
Among them, this line of code for constructing the length of the ip
in hexadecimal:
ip_len = hex(len(self.ip)+len(self.port)+1).replace('x', '')
Did not properly handle the situation where the length of ip
+ :
+ port
exceeds 15 characters:
127.0.0.1:80
: 12 characters,ip_len=0C
110.110.110.110:65535
: 21 characters,ip_len=015
The parameter values have the following changes:
Proxy Switch 0000000C
Proxy Switch 000000015
In the middle, an extra 0
is added! This causes the parameter value passed in to change from 4600...
to 04600...
, resulting in the proxy not working.
So we modified the ip_len
statement to:
# The hex() method converts the number starting with 0x and the IP length occupies two digits
ip_len = hex(len(ip)+len(port)+1).split("x")[-1].zfill(2)
This solved the problem. This story tells us to be meticulous.
Conclusion#
The final effect is as follows
- Crawling proxy part:
The database is empty, and the IP is in a recently blocked state (click to enlarge on the computer):
Note: This is the effect of alternating pause start recording, most frames have been extracted to reduce gif size, actual events refer to the software display area timestamp.
- Proxy part (click to enlarge on the computer):
- To protect myself, the original IP is masked.
- The code is not yet fully perfected, and there may be connections with poor speed IPs. This gif shows a proxy with decent connection speed, which is the ideal state. I will add a verification function before connecting when publishing the
github
project. - Note: This is the effect of alternating pause start recording, most frames have been extracted to reduce gif size, actual events refer to the software display area timestamp.
There is too much full code, so I won't post it. After finishing this deep learning project in the next few days, I will open source this project on github
as my first open-source project.
You can also go give me a star
, raise an issue
, or something.
Finally, here is a picture of the UI design I drew while conceptualizing:
Now the software interface:
The article is complete~