This project is mirrored from https://github.com/Mintplex-Labs/anything-llm.
Pull mirroring updated .
- Jan 31, 2025
-
-
Timothy Carambat authored
-
Timothy Carambat authored
* Add tokenizer improvments via Singleton class linting * dev build * Estimation fallback when string exceeds a fixed byte size * Add notice to tiktoken on backend
-
- Dec 30, 2024
-
-
Sean Hatfield authored
* add audio file validations * patch sharp to support wavfile parsing --------- Co-authored-by:
timothycarambat <rambat1010@gmail.com>
-
- Dec 11, 2024
-
-
Sean Hatfield authored
* fix scraping failed bug in link/bulk link scrapers * reset submodule * swap to networkidle2 as a safe mix for SPA and API-loaded pages, but also not hang on request heavy pages * lint --------- Co-authored-by:
timothycarambat <rambat1010@gmail.com>
-
- Nov 20, 2024
-
-
Timothy Carambat authored
-
- Nov 12, 2024
-
-
Sean Hatfield authored
remove openai whisper transcription provider response_format option
-
- Oct 31, 2024
-
-
Sean Hatfield authored
* allow 127.0.0.1 as valid url for scraping * update comments and lint --------- Co-authored-by:
timothycarambat <rambat1010@gmail.com>
-
- Oct 28, 2024
-
-
timothycarambat authored
-
- Oct 21, 2024
-
-
Sean Hatfield authored
* fix tree/blob github urls from branches not being loaded * improve ux of github data connector * lint * patch Github URL parser to just validate with `URL` native parser * uncheck LocalStorage of PAT for security reasons --------- Co-authored-by:
Timothy Carambat <rambat1010@gmail.com>
-
- Oct 18, 2024
-
-
timothycarambat authored
-
Sean Hatfield authored
handle non-ascii characters in urls
-
- Oct 03, 2024
-
-
Sean Hatfield authored
* support xlsx files * lint * create seperate docs for each xlsx sheet * lint * use node-xlsx pkg for parsing xslx files * lint * update error handling --------- Co-authored-by:
timothycarambat <rambat1010@gmail.com>
-
- Oct 02, 2024
-
-
Timothy Carambat authored
-
- Sep 30, 2024
-
-
Blazej Owczarczyk authored
-
- Sep 26, 2024
-
-
Timothy Carambat authored
* Updated the `GitHubRepoLoader` class to use the new import syntax and adjust the `recursiveLoader` method accordingly. * add @langchain/community to collector package.json * fix: Improve handling of complex ignore patterns in GitLabRepoLoader * refactor: use ignore package for simplified ignore logic * run yarn lint * add @langchain/community@^0.2.23 * remove unused dep lint --------- Co-authored-by:
Emil Rofors (aider) <emirof@gmail.com>
-
Blazej Owczarczyk authored
* Added an option to fetch issues from gitlab. Made the file fetching asynchornous to improve performance. #2334 * Fixed a typo in loadGitlabRepo. * Convert issues to markdown. * Fixed an issue with time estimate field names in issueToMarkdown. * handle rate limits more gracefully + update checkbox to toggle switch * lint --------- Co-authored-by:
Timothy Carambat <rambat1010@gmail.com> Co-authored-by:
shatfield4 <seanhatfield5@gmail.com>
-
Timothy Carambat authored
-
- Sep 25, 2024
-
-
Sean Hatfield authored
* support more confluence url formats * use pattern matching for confluence urls and manual splitting as fallback * rework entire Confluence flow to prevent issues with custom, local, and cloud spaces * remove dep --------- Co-authored-by:
Timothy Carambat <rambat1010@gmail.com>
-
- Sep 19, 2024
-
-
Sean Hatfield authored
confluence custom domain fix
-
Timothy Carambat authored
* Fix gitlab data connector for self-hosted instances (#2315) * Linting fix. * Load all branches in the GitLab data connector #2319 * #2319 lint fixes. * update fetch on fail --------- Co-authored-by:
Błażej Owczarczyk <blazeyy@gmail.com>
-
- Sep 18, 2024
-
-
Blazej Owczarczyk authored
* Fix gitlab data connector for self-hosted instances (#2315) * Linting fix.
-
- Sep 09, 2024
-
-
timothycarambat authored
connect #2243
-
- Sep 06, 2024
-
-
timothycarambat authored
-
- Aug 14, 2024
-
-
timothycarambat authored
resolves #2114
-
timothycarambat authored
-
- Aug 12, 2024
-
-
Sean Hatfield authored
fix depth handling in bulk link scraper
-
- Aug 10, 2024
-
-
Lea Anthony authored
Support Go filetype
-
- Aug 06, 2024
-
-
Mehmet Ünlü authored
fix: remove unnecessary break Remove unnecessary break that prevents checking next pages for blob objects.
-
Sean Hatfield authored
youtube loader whitespace fix
-
- Jul 25, 2024
-
-
Timothy Carambat authored
* Remove unused deps * improve dependency
-
- Jul 23, 2024
-
-
Timothy Carambat authored
* Add support for GitLab repo collection as well as Github Repo collection * Refactor for repo collectors to be more compact --------- Co-authored-by:
Emil Rofors <emirof@gmail.com>
-
- Jul 20, 2024
-
-
timothycarambat authored
-
timothycarambat authored
-
- Jul 16, 2024
-
-
Sean Hatfield authored
use pdf.js by importing it from pdf-parse and fix custom PDFLoader module
-
- Jul 11, 2024
-
-
Sean Hatfield authored
* implement custom PDFLoader to remove LC dep * remove unneeded comment * remove pdfjs as dep and fix page splitting using pdf-parse * linting + export rename for desktop compat --------- Co-authored-by:
timothycarambat <rambat1010@gmail.com>
-
- Jul 04, 2024
-
-
timothycarambat authored
-
- Jul 03, 2024
-
-
Timothy Carambat authored
-
Sean Hatfield authored
* WIP replace langchain pdfloader with pdfjs and add more context to each page * remove extras from pdfjs and just replace langchain library * remove unneeded dep * fix console log in docs --------- Co-authored-by:
timothycarambat <rambat1010@gmail.com>
-
Sean Hatfield authored
implement custom confluence loader to extract code blocks properly from documents Co-authored-by:
Timothy Carambat <rambat1010@gmail.com>
-
- Jul 01, 2024
-
-
Sean Hatfield authored
patch website depth data connector to work for other links that are not root url
-