Process a subset of files
The following information applies only to the Unstructured Ingest CLI and the Unstructured Ingest Python library.
The Unstructured SDKs for Python and JavaScript/TypeScript and the Unstructured open-source library do not support this functionality.
Task
You want to process only files with specified extensions, only files at or below a specified size, or both.
Approach
For the Ingest CLI, use the following command options. For the Ingest Python library, use the following parameters for the FiltererConfig
object.
- Use
--file-glob
(CLI) orfile_glob
(Python) to specify the list of file extensions to process. - Use
--max-file-size
(CLI) ormax_file_size
(Python) to specify the maximum size of files to process, in bytes.
To run this example
The following example processes only .pdf
and .eml
files that have a file size of 100 KB or less. To run this example, you should have a directory
with a mixture of files, including at least one .pdf
file and one .eml
file, and with at least one of these files having a file size of 100 KB or less.
Code
Was this page helpful?