Skip to content
Snippets Groups Projects
Unverified Commit e8f8bea9 authored by Fabian Wimmer's avatar Fabian Wimmer Committed by GitHub
Browse files

feat: add boundingBox and targetPages to LlamaParseReader (#1017)

parent 304484b7
No related branches found
No related tags found
No related merge requests found
---
"llamaindex": patch
---
feat: add boundingBox and targetPages to LlamaParseReader
...@@ -44,6 +44,8 @@ They can be divided into two groups. ...@@ -44,6 +44,8 @@ They can be divided into two groups.
- `pageSeperator?` Optional. The page seperator to use. Defaults is `\\n---\\n`. - `pageSeperator?` Optional. The page seperator to use. Defaults is `\\n---\\n`.
- `gpt4oMode` set to true to use GPT-4o to extract content. Default is `false`. - `gpt4oMode` set to true to use GPT-4o to extract content. Default is `false`.
- `gpt4oApiKey?` Optional. Set the GPT-4o API key. Lowers the cost of parsing by using your own API key. Your OpenAI account will be charged. Can also be set in the environment variable `LLAMA_CLOUD_GPT4O_API_KEY`. - `gpt4oApiKey?` Optional. Set the GPT-4o API key. Lowers the cost of parsing by using your own API key. Your OpenAI account will be charged. Can also be set in the environment variable `LLAMA_CLOUD_GPT4O_API_KEY`.
- `boundingBox?` Optional. Specify an area of the document to parse. Expects the bounding box margins as a string in clockwise order, e.g. `boundingBox = "0.1,0,0,0"` to not parse the top 10% of the document.
- `targetPages?` Optional. Specify which pages to parse by specifying them as a comma-seperated list. First page is `0`.
- `numWorkers` as in the python version, is set in `SimpleDirectoryReader`. Default is 1. - `numWorkers` as in the python version, is set in `SimpleDirectoryReader`. Default is 1.
### LlamaParse with SimpleDirectoryReader ### LlamaParse with SimpleDirectoryReader
......
...@@ -8,7 +8,7 @@ In JSON mode, LlamaParse will return a data structure representing the parsed ob ...@@ -8,7 +8,7 @@ In JSON mode, LlamaParse will return a data structure representing the parsed ob
## Usage ## Usage
For Json mode, you need to use `loadJson`. The `resultType` is automatically set with this method. Currently it can't be used with `SimpleDirectoryReader`. For Json mode, you need to use `loadJson`. The `resultType` is automatically set with this method.
More information about indexing the results on the next page. More information about indexing the results on the next page.
```ts ```ts
......
...@@ -133,6 +133,10 @@ export class LlamaParseReader extends FileReader { ...@@ -133,6 +133,10 @@ export class LlamaParseReader extends FileReader {
gpt4oMode: boolean = false; gpt4oMode: boolean = false;
// The API key for the GPT-4o API. Optional, lowers the cost of parsing. Can be set as an env variable: LLAMA_CLOUD_GPT4O_API_KEY. // The API key for the GPT-4o API. Optional, lowers the cost of parsing. Can be set as an env variable: LLAMA_CLOUD_GPT4O_API_KEY.
gpt4oApiKey?: string; gpt4oApiKey?: string;
// The bounding box to use to extract text from documents. Describe as a string containing the bounding box margins.
boundingBox?: string;
// The target pages to extract text from documents. Describe as a comma separated list of page numbers. The first page of the document is page 0
targetPages?: string;
// Whether or not to ignore and skip errors raised during parsing. // Whether or not to ignore and skip errors raised during parsing.
ignoreErrors: boolean = true; ignoreErrors: boolean = true;
// numWorkers is implemented in SimpleDirectoryReader // numWorkers is implemented in SimpleDirectoryReader
...@@ -183,6 +187,8 @@ export class LlamaParseReader extends FileReader { ...@@ -183,6 +187,8 @@ export class LlamaParseReader extends FileReader {
page_seperator: this.pageSeperator, page_seperator: this.pageSeperator,
gpt4o_mode: this.gpt4oMode?.toString(), gpt4o_mode: this.gpt4oMode?.toString(),
gpt4o_api_key: this.gpt4oApiKey, gpt4o_api_key: this.gpt4oApiKey,
bounding_box: this.boundingBox,
target_pages: this.targetPages,
}; };
// Appends body with any defined LlamaParseBodyParams // Appends body with any defined LlamaParseBodyParams
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment