Skip to main content

Documents

In whyhow, documents are the primary data source for creating knowledge graphs. They contain the raw information processed, analyzed, and transformed into structured entities, relationships, and insights. This guide will explain the concept of documents in whyhow and how they are used within the platform.

What are Documents?

Documents in whyhow refer to the files or data sources that hold the unstructured or semi-structured information you want to incorporate into your knowledge graph. These documents can come in various formats, such as:

  • Document files (e.g. PDF, TXT)
  • Tabular files (e.g. CSV)
  • Structured data (e.g. JSON)

Documents contain the raw text, data, and metadata that will be extracted, analyzed, and transformed into meaningful entities and relationships within your knowledge graph.

Document Upload and Management

To use documents in whyhow, you need to upload them to your workspace. The platform provides an intuitive interface for uploading and managing documents:

  1. Navigate to your workspace and click the "Documents" tab.
  2. Click the "Upload Document" button and select the file(s) you want to upload.
  3. Provide a name and description for each document to help identify and organize them within the workspace.
  4. Once uploaded, the documents will appear in the document list and metadata such as the file type, size, and upload date.

You can also perform various management tasks on the uploaded documents, such as:

  • Viewing document details and metadata
  • Downloading documents for local use
  • Updating document names and descriptions
  • Deleting documents that are no longer needed

Document Upload

Document Processing and Analysis

After uploading documents to your workspace, whyhow processes and analyzes them to extract valuable information and insights. The platform employs advanced natural language processing (NLP) techniques and machine learning algorithms to:

  1. Text Extraction: The platform extracts the raw text content from the documents, handling various file formats and structures. It identifies and separates the main text from metadata, headers, footers, and other non-essential elements.

  2. Named Entity Recognition (NER): whyhow applies NER techniques to identify and extract named entities from the document text. This includes detecting entities such as people, organizations, locations, dates, etc. The extracted entities form the basis for creating nodes in the knowledge graph.

  3. Relationship Extraction: The platform analyses the context and syntax of the document text to identify relationships between the extracted entities. It uses techniques like co-reference resolution, dependency parsing, and pattern matching to infer connections and associations between entities.

The processed and analyzed documents are the foundation for generating the knowledge graph. The extracted entities, relationships, and insights are structured and connected based on the defined schemas and ontologies within the workspace.

Best Practices for Document Preparation

To ensure optimal results and efficiency when working with documents in whyhow, consider the following best practices:

  1. File Formats: Use standard and widely supported file formats for your documents. Stick to PDF, TXT, JSON or CSV for now, with more coming soon.

  2. Document Quality: Ensure that your documents are of good quality and free from excessive noise, formatting issues, or corrupted content. Clean and preprocess the documents if necessary to improve the accuracy of the extraction and analysis process.

  3. Relevant Content: Include only relevant and meaningful content in your documents. Remove irrelevant or redundant information that may introduce noise or confusion into the knowledge graph.

  4. Consistent Formatting: Use standard headings, paragraphs, and lists to maintain consistent document formatting. This helps the platform better understand the structure and hierarchy of the content.

  5. Metadata: Provide accurate and descriptive metadata for your documents, such as titles, authors, dates, and keywords. This additional context can aid in organizing entities and relationships.

By following these best practices and leveraging whyhow's document processing capabilities, you can effectively transform your unstructured data into valuable knowledge graphs that drive insights and decision-making.

For more information on uploading and managing documents, refer to the Document Upload Guide. For details on the document processing and analysis techniques used in whyhow, consult the Developer Guide.