Disparate Data Challenge

Demo Instructions

  2. Search for content in the data sets!

    • Searching: You can enter queries in the search bar at the top. When the search bar contains no query, the most recently accessed/modified resources show up in the results.

    • Opening: Clicking on a result in the list will open the resource directly. You also have the option to show a resource on the website it comes from. You can do this by right-clicking a result and selecting “Show in Context” from the dropdown menu (or clicking the “Show in Context” button in the right sidebar).

    • Switching view modes: You can choose between 3 different view modes: Smart Tag, Description, and Compact. The Smart Tag view mode shows the Smart Tags Meta has automatically applied to each resource, giving you a quick overview of each search result. The Description view mode shows a description of each resource, taken directly from the resources themselves. The compact view mode shows less information for each file result, but fits more results per page. You can switch between the different view modes by clicking the username (Disparate Data) in the top right of the interface and selecting one of the view modes at the bottom of the dropdown.

    • Filtering results : In addition to searching, you can also filter the results by date and file type. You can access the filters by clicking the arrow at the far right of the search bar. In the Smart Tag view mode, you can also click a Smart Tag in the results list to apply it as a filter.

Description of Solution

Our solution (Meta) is a simple, user-friendly search tool that allows you to search for content across all of the challenge data sets.

Meta solves the problem of information accessibility; it collects data from many different databases and makes it searchable from a single place. This minimizes the amount of time you spend searching for and collecting content, and maximizes the time you spend interacting with and understanding the content.

Meta is designed to connect to all of your cloud drives (Google Drive, Dropbox, Evernote, OneDrive, Slack, Gmail, Trello). For the purposes of this challenge, we modified our existing framework to create a generic ETL (Extract, Transform, Load) system. We used Google Cloud Storage (Google’s version of S3) to store all of the data we crawled from the sources provided, and converted them to our consistent schema. This allows Meta to quickly search all the data sources at once.

Meta’s search is powered by Smart Tags. When Meta processes content across the disparate databases, it extracts the most important concepts, topics, subjects, etc. from the content, metadata, and descriptions, and applies them automatically as tags to each file or web resource.

These Smart Tags aren’t only useful for search - they also provide an excellent overview so you can quickly scan through search results and determine what each resource is and whether or not it will be helpful for your current task. Smart Tags can also serve as filters to help narrow down search results. Simply click a Smart Tag in the results list, and the results will be limited to resources that match that Smart Tag. Using Smart Tags to filter the content allows the user to unlock stories that might exist across disparate data sources and multiple file formats.

Instead of building a new search engine for each file format on each storage silo, Smart Tagging can handle all formats and make any filetype easily searchable in a unified way. Users can manually augment the automatically generated tags to improve future retrieval (although our current submission disables manual tag management because it’s a public account).

Meta is designed to take you directly to the data so that you can use it in the tools that you’re most familiar with, whether that’s Google Earth, Microsoft Excel, or the ArcGIS web map viewer.

With more time we would :

  • Extend the number and quality of previewable formats

  • Extend the number of file formats that we extract metadata for

  • Talk to NGA users and use their feedback to optimize Meta for specific user workflows and intelligence tasks

