Skip to content
Snippets Groups Projects
Unverified Commit a52b0ae6 authored by AntonioCiolino's avatar AntonioCiolino Committed by GitHub
Browse files

Updated Link scraper to avoid NoneType error. (#90)

* Enable web scraping based on a urtl and a simple filter.

* ignore yarn

* Updated Link scraper to avoid NoneType error.
parent 4072369f
No related branches found
No related tags found
No related merge requests found
......@@ -7,4 +7,4 @@ __pycache__
v-env
.DS_Store
aws_cf_deploy_anything_llm.json
yarn.lock
......@@ -80,12 +80,14 @@ def crawler():
# traverse paragraphs from soup
for link in soup.find_all("a"):
data = link.get('href').strip()
if filter_value in data:
print (data)
links.append(root_site + data)
else:
print (data + " does not apply for linking...")
data = link.get('href')
if (data is not None):
if filter_value in data:
data = data.strip()
print (data)
links.append(root_site + data)
else:
print (data + " does not apply for linking...")
#parse the links found
parse_links(links)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment