Analytics | Amazon CloudSearch Flashcards
What is Amazon CloudSearch?
General
Amazon CloudSearch | Analytics
Amazon CloudSearch is a fully-managed service in the AWS Cloud that makes it easy to set up, manage, and scale a search solution for your website or application.
What are the benefits of running a managed search service like Amazon CloudSearch over running my own search service on EC2?
General
Amazon CloudSearch | Analytics
Amazon CloudSearch provides several benefits over running your own self-managed search service including easy configuration, auto scaling for data and traffic, self-healing clusters, and high availability with Multi-AZ. With a few clicks in the AWS Management Console, you can create a search domain and upload the data you want to make searchable, and Amazon CloudSearch automatically provisions the required resources and deploys a highly tuned search index.
What is a search engine?
General
Amazon CloudSearch | Analytics
A search engine makes it possible to search large collections of mostly textual data items (called documents) to quickly find the best matching results. Search requests are usually a few words of unstructured text, such as “matt damon movies”. The returned results are usually ranked with the best matching, or most relevant, items listed first (the ones that are most “about” the search words).
Documents may be completely unstructured, or they can contain multiple fields that can optionally be searched individually. For example, a search service for movies might have documents with fields for title, director, actor, description, and reviews. Results returned by a search engine are typically proxies for the underlying documents, such as URLs that reference particular web pages. However, the search service can also return the actual contents of individual fields.
What benefits does Amazon CloudSearch offer?
General
Amazon CloudSearch | Analytics
Amazon CloudSearch is a fully managed search service that automatically scales with the volume of data and complexity of search requests to deliver fast and accurate results. Amazon CloudSearch lets customers add search capability without needing to manage hosts, traffic and data scaling, redundancy, or software packages. Users pay low hourly rates only for the resources consumed. Amazon CloudSearch can offer significantly lower total cost of ownership compared to operating and managing your own search environment.
Can Amazon CloudSearch be used with a storage service?
General
Amazon CloudSearch | Analytics
A search service and a storage service are complementary. A search service requires that your documents already be stored somewhere, whether it’s in files of a file system, data in Amazon S3, or records in an Amazon DynamoDB or Amazon RDS instance. The search service is a rapid retrieval system that makes those items searchable with sub-second latencies through a process called indexing.
Can Amazon CloudSearch be used with a database?
General
Amazon CloudSearch | Analytics
Search engines and databases are not mutually exclusive - in fact, they are often used together. If you already have a database that contains structured data, you might want to use a search engine to intelligently filter and rank the database contents using search keywords as relevance criteria.
A search service can be used to index and search both structured and unstructured data. Content can come from multiple sources and can include database fields along with files in a variety of formats, web pages, and so on. A search service can support customizable result ranking as well as special search features such as using facets for filtering that are not available in databases.
What regions is Amazon CloudSearch available in?
About the 2013-01-01 API
Amazon CloudSearch | Analytics
Amazon CloudSearch is available in the following AWS Regions: US East (Northern Virgina), US West (Oregon), US West (N. California), EU (Ireland), EU (Frankfurt), South America (Sao Paulo) and Asia Pacific (Singapore, Tokyo, Sydney, and Seoul).
What new features does Amazon CloudSearch support?
About the 2013-01-01 API
Amazon CloudSearch | Analytics
With this latest release Amazon CloudSearch supports several new search and administration features. The key new features include:
Language support:
34 languages, plus “multiple” to handle mixed language fields
Per-field language configuration
Language-specific text analysis
Multiple levels of algorithmic stemming are available for many languages, including “none”
Enhanced search features:
Suggestions
Highlighting
Geospatial search
New data types: date, double, 64 bit signed int, latlon
Sloppy phrase search
Term boosting
Enhanced range searching for all field types
Support for multiple query parsers: simple, structured, lucene, dismax
Query parser configuration options
Administration features:
High availability option
IAM integration
User configurable scaling
Available in additional AWS Regions: Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Seoul), and South America (Sao Paulo)
Does Amazon CloudSearch still support dictionary stemming?
About the 2013-01-01 API
Amazon CloudSearch | Analytics
Yes. The new version of Amazon CloudSearch supports dictionary stemming in addition to algorithmic stemming.
Does the new version of Amazon CloudSearch use Apache Solr?
About the 2013-01-01 API
Amazon CloudSearch | Analytics
Yes. The latest version of Amazon CloudSearch has been modified to use Apache Solr as the underlying text search engine. Amazon CloudSearch now provides several popular search engine features available with Apache Solr in addition to the managed search service experience that makes it easy to set up, operate, and scale a search domain.
Can I access the new version of Amazon CloudSearch through the console?
About the 2013-01-01 API
Amazon CloudSearch | Analytics
Yes. You can access the new version of Amazon CloudSearch through the console. If you are a current Amazon CloudSearch customer with existing search domains, you have the option to select which version of Amazon CloudSearch you want to use when creating new search domains. New customers will use the new version of Amazon CloudSearch by default and will not have access to the 2011-01-01 version.
What data types does the new version of Amazon CloudSearch support?
About the 2013-01-01 API
Amazon CloudSearch | Analytics
Amazon CloudSearch supports two types of text fields, text and literal. Text fields are processed according to the language configured for the field to determine individual words that can serve as matches for queries. Literal fields are not processed and must match exactly, including case. CloudSearch also supports four numeric types: int, double, date, and latlon. Int fields hold 64-bit, signed integer values. Double fields hold double-width floating point values. Date fields hold dates specified in UTC (Coordinated Universal Time) according to IETF RFC3339: yyyy-mm-ddT00:00:00Z. Latlon fields contain a location stored as a latitude and longitude value pair.
Will my existing search domains created with the 2011-02-01 version of Amazon CloudSearch continue to work?
About the 2013-01-01 API
Amazon CloudSearch | Analytics
Yes. Existing search domains created with the 2011-02-01 version of Amazon CloudSearch will continue to work.
Will I be able to use the new features on my existing search domains created with the 2011-01-01 version of Amazon CloudSearch?
About the 2013-01-01 API
Amazon CloudSearch | Analytics
No. Existing search domains created with the 2011-01-01 version of Amazon CloudSearch will not have access to the features available in the new version. To access the new features you will have to create a new search domain using the 2013-01-01 version of Amazon CloudSearch.
How can I migrate my applications built using the 2011-01-01 version of Amazon CloudSearch to the new version of Amazon CloudSearch?
About the 2013-01-01 API
Amazon CloudSearch | Analytics
To use the new version of Amazon CloudSearch you need to recreate existing domains using the new version of Amazon CloudSearch and re-upload your data. For more information, see Migrating to the 2013-01-01 API in the Amazon CloudSearch Developer Guide.
Will AWS continue to support the 2011-02-01 version of Amazon CloudSearch?
About the 2013-01-01 API
Amazon CloudSearch | Analytics
Yes. AWS will continue support for the 2011-02-01 version of Amazon CloudSearch.
Can I create new search domains using the 2011-02-01 version of Amazon CloudSearch?
About the 2013-01-01 API
Amazon CloudSearch | Analytics
Current Amazon CloudSearch customers who have existing 2011-02-01 domains will be able to choose whether their new domains use the 2011-02-01 API or the new 2013-01-01 API. Search domains created by new customers will automatically be created with the 2013-01-01 API.
Can I take advantage of the free trial offer with the new version of Amazon CloudSearch?
Getting Started
Amazon CloudSearch | Analytics
New customers will still be able to take advantage of the free trial offer available with Amazon CloudSearch. See the Amazon CloudSearch Free Trial page for details.
How do I get started with Amazon CloudSearch?
Getting Started
Amazon CloudSearch | Analytics
To sign up for Amazon CloudSearch, click the Create Free Account button on the Amazon CloudSearch detail page and complete the sign-up process. You must have an Amazon Web Services account. If you do not already have one, you will be prompted to create an AWS account when you begin the Amazon CloudSearch sign-up process.
After you have signed up, select Amazon CloudSearch from the AWS Management Console. Using the Amazon CloudSearch console you can quickly create a search domain, configure your search fields, upload sample data, and send search queries to your search domain. You can also use the AWS SDKs and the CLI to perform these operations.
For more information, see the Getting Started tutorial in the Amazon CloudSearch Developer Guide.
Do the AWS SDKs support Amazon CloudSearch?
Getting Started
Amazon CloudSearch | Analytics
Yes, the AWS SDKs for Java, Ruby, Python, .Net, PHP, and Node.js provide support for CloudSearch. Using the AWS SDKs you can quickly create a search domain, configure your search fields, upload data, and send search queries to your search domain.
Does the AWS CLI support Amazon CloudSearch?
Getting Started
Amazon CloudSearch | Analytics
Yes, the AWS CLI provides support for CloudSearch. Using the AWS CLI you can quickly create a search domain, configure your search fields, upload data, and send search queries to your search domain.
Can I still use the Amazon CloudSearch CLTs?
Search Domains, Data, and Indexing
Amazon CloudSearch | Analytics
Yes, the Amazon CloudSearch CLTs will continue to work.
What is a search domain and how do I create one?
Search Domains, Data, and Indexing
Amazon CloudSearch | Analytics
A search domain is a data container and a set of services that make the data searchable. These services include:
A document service that allows you upload data to your domain for indexing.
A search service that allows you to perform search requests against your indexed data.
A configuration service for controlling your domain’s behavior (including relevance ranking).
You can create, manage, and delete search domains using the AWS Management Console, AWS SDKs, or AWS CLI.
How do I upload documents to my search domain?
Search Domains, Data, and Indexing
Amazon CloudSearch | Analytics
You upload documents to your domain using the AWS Management Console, AWS SDKs, or AWS CLI.
Do my documents need to be in a particular format?
Search Domains, Data, and Indexing
Amazon CloudSearch | Analytics
To make your data searchable, you need to format your data in JSON or XML. Each item that you want to be able to receive as a search result is represented as a document. Every document has a unique document ID and one or more fields that contain the data that you want to search and return in results. Amazon CloudSearch generates a search index from your document data according to the index fields configured for the domain. As your data changes, you submit updates to add or delete documents from your index.
How do I create document batches formatted for Amazon CloudSearch?
Search Domains, Data, and Indexing
Amazon CloudSearch | Analytics
To create document batches that describe your data, you create JSON or XML text files that specify:
The operation type: add or delete
A unique identifier
The actual fields and their data
The following example shows a single document batch formatted in JSON:
[
{
“fields” : {
“directors” : [
“Francis Lawrence”
],
“release_date” : “2013-11-11T00:00:00Z”,
“genres” : [
“Action”,
“Adventure”,
“Sci-Fi”,
“Thriller”
],
“image_url” : “http://ia.media-imdb.com/images/M/MV5xMzzAx._V1_SX400_.jpg”,
“plot” : “Katniss Everdeen and Peeta Mellark become targets of the Capitol after their victory in the 74th Hunger Games sparks a rebellion in the Districts of Panem.”,
“title” : “The Hunger Games: Catching Fire”,
“rank” : 4,
“running_time_secs” : 8760,
“actors” : [
“Jennifer Lawrence”,
“Josh Hutcherson”,
“Liam Hemsworth”
],
“year” : 2013
},
“id” : “tt1951264”,
“type” : “add”
}
]
Note that numeric values such as the year are not enclosed in quotes, and that values in a multi-value field such as genres are listed in a JSON array.
To make this data available to Amazon CloudSearch, you can save it to a file and upload it using the AWS Management Console, AWS SDKs, or AWS CLI.
How do my documents get indexed?
Search Domains, Data, and Indexing
Amazon CloudSearch | Analytics
Documents are automatically indexed when you upload them to your search domain. You can also explicitly re-index your documents when you make configuration changes by sending an IndexDocuments request.
When do I need to re-index my domain?
Search Domains, Data, and Indexing
Amazon CloudSearch | Analytics
Certain configuration options, such as adding a new index field or updating your stemming or stopword dictionaries, are not available until your domain is re-indexed. When you have made changes that require indexing, the domain’s status will indicate that it needs to be indexed. You can initiate indexing from the AWS Management Console, AWS SDKs, or AWS CLI.
How do I send search requests to my search domain?
Search Domains, Data, and Indexing
Amazon CloudSearch | Analytics
Every search domain has a REST-based search service with a unique URL (search endpoint) that accepts search requests for its document set. You can send search requests from the AWS Management Console, AWS SDKs, or AWS CLI.