SearchStax Solr

SearchStax is a service for a managed/hosted Solr. As of this writing we are strongly considering moving our Solr there.

Login to SearchStax

The searchstax console is at https://app.searchstax.com

The kind of SearchStax account we are using only allows two SearchStax console logins, so we are sharing a digital-tech@sciencehistory.org login. The password can be found on P drive at P:\Support\Computer Services\Digital Collections\searchstax_credentials.txt

Beware of Sendio spam protection accidentally trapping communications from searchstax meant for digital-tech@sciencehistory.org.

Solr Auth Protection

It is important we protect Solr from being accessible to the public, as by default the public would have access to Solr admin functions and perhaps information in the solr index meant to be protected to only some users' access.

SearchStax allows protection based on IP address or password (HTTP basic auth). We do not rely on IP address, which eg would not be feasible in a heroku environment (without persistent unique-to-us IP addresses). But instead use only HTTP basic Auth.

You can configure as many Solr auth accounts as you want – these are different from the SearchStax console accounts. You do so in the SearchStax console, after selecting a specific deployment (eg staging or production) at Security / Auth menu. SearchStax offers three permission levels: read (without write); write (without read!); or “Read, Write, Admin”. It is only the latter that is really useful to us generally; this is fine, our app previously had complete access to Solr, and it still does.

We configure a single account scihist_digicoll, the password of which is also at P:\Support\Computer Services\Digital Collections\searchstax_credentials.txt (different passwords should be used for production and staging). This username or password can be changed whenever you want, as long as you update app config to use it!

To access the Solr admin pages (URL found via SearchStax console), you would also need a Solr auth account like this. You can re-use the account we use for the app, or just create an account for yourself in the SearchStax console.

SearchStax doc: https://www.searchstax.com/docs/security/

Configuring our Rails App

We configure the app to talk to Solr via a single configuration variable. SOLR_URL in Env/heroku config or solr_url in local_env.yml.

This should include the http basic auth info in the URL. It also should include the name of the app collection (normally scihist_digicoll on the end) of the URL. Such as:

http://scihist_digicoll:$password/ssNNNNNN-aaaaaaa-us-east-1-aws.searchstax.com/solr/scihist_digicoll

You can get the basic solr URL for a deployment (production, staging, etc) from the SearchStax console. Then add the HTTP basic auth (password can be found at P:\Support\Computer Services\Digital Collections\searchstax_credentials.txt) , and add the collection name /scihist_digicoll on the end.

This is the URL to give to the app as SOLR_URL/solr_url config.

The app will use this for reading, writing, as well as updating solr configuration.

Updating Solr Configuration

With our previous solr installation, we provided our Solr configuration files by putting them in a directory on disk that the Solr was configured to use. And we updated files in that actual directory on every capistrano deploy.

Using SearchStax, we don’t have access to that, but we do have access to Solr Cloud APIs for uploading your Solr config directory as a “config set”, and configuring your Solr “collection” to use it.

We have written ruby code to use those APIs, including some high-level rake tasks:

# to create a NEW collection from the solr config in the project at ./solr/config # using the collection name included in your SOLR_URL. # Useful for bootstrapping on a brand new SearchStax or other Solr Cloud deployment ./bin/rake scihist:solr_cloud:create_collection # In an existing collection, sync the solr config in the project at ./solr/config # to the remote Solr Cloud instance configured in SOLR_URL. ./bin/rake scihist:solr_cloud:sync_configset

Both of these rely on SOLR_URL being set for Solr location, as well as including the HTTP basic auth credentials needed to use the Solr API. They normally will be set in a deployment location, or you can set locally, for instance with SOLR_URL=whatever on the command line you execute with rake task.

You can run rake tasks remotely on heroku with heroku run rake scihist… or via our ansible/EC2 setup with the capistrano task for running rake tasks remotely. Or by opening up a bash console to either, etc.

The sync_configset task is designed to be run on every deployment, much like db:migrate, it can make sure on every deployment that any changes in the repo solr/config directory meant to go along with the deployed version of code get deployed with the code. However, if there are solr config changes that requires a re-index, you will still need to take care of that “manually”, planning for downtime or figuring out a no-downtime way to do it, etc.

Note: We do not use SearchStax proprietary API

SearchStax has some custom-to-SearchStax API for managing Solr configsets and collections. I am not sure why it exists as it seems to duplicate what the standard Solr API’s use, while being somewhat harder to use. We do not use it. We use standard Solr API which should work on any Solr Cloud instance, not just SearchStax.

SearchStax docs/support say that the SearchStax API is “more secure”, but being protected by Solr basic auth seem sufficient (and those APIs are available to anyone with our Solr basic auth credentials whether we use them or not!). And also that the SearchStax API is logged/logged differently; I don’t think we care.