Architecture of Information Systems
Search Engine

John Samuel
CPE Lyon

Year: 2018-2019
Email: john(dot)samuel(at)cpe(dot)fr

Creative Commons License

Architecture of Information Systems

Outline: Search engine

  1. Frontend development
  2. Backend development
  3. Application programming interface
  4. Towards Federated Search Engines

1. Frontend development: Search Engine

Target devices

1. Frontend development: Search Engine

Desktop PC

DuckDuckGo

1. Frontend development: Search Engine

Desktop PC

Qwant

1. Frontend development: Search Engine

Desktop PC

Wikipedia

1. Frontend development: Search Engine

Mobile phones

DuckDuckGo

1. Frontend development: Search Engine

Mobile phones

Google

1. Frontend development: Search Engine

Mobile phones

Wikipedia

1. Frontend development: Search Engine

Mobile phones

Wikipedia

1. Frontend development: Features

Autocompletion

Wikipedia Autocomplete(November 2018)

1. Frontend development: Features

Autocorrection

Softwre (November 2018 Qwant Autocorrect)

1. Frontend development: Features

Autocorrection

Softwre (November 2018 DuckDuckGo)

1. Frontend development: Search Engine

Advanced Search

1. Frontend development: Search Engine Filters

Advanced Search: Filters

More options (filters)

1. Frontend development: Search Engine Filters

Advanced Search: Filters

More options (filters)

1. Frontend development: Search Engine Filters

Advanced Search: Filters

More options (filters)

1. Frontend Development: Target Audience

Users of search interfaces

1. Frontend Development: Target Audience

Users of search interfaces

1. Frontend development: Results

Search Results



Leonardo da Vinci (November 2018 Qwant results)

1. Frontend development: Results

Search Results



Leonardo da Vinci (October 2017 Google results)

1. Frontend development: Results

Search Results

Leonardo da Vinci (October 2017 Google results)

1. Frontend development: Results

Search Results: Timeline

Artists on Histropedia

1. Frontend development: Results

Search Results: Map

Location of Archaelogical sites (Wikidata)

1. Frontend development: Results

Search Results: Map

Need for Geographical Coordiantes

SELECT ?item ?location
WHERE {
  ?item wdt:P1435 wd:Q9259;
        wdt:P625 ?location
}
Location of Archaelogical sites (Wikidata)

1. Frontend development: Results

Search paradigms [1]

1. Frontend development: Results

Computational knowledge engine



Wolfram Alpha

1. Frontend development: Results

Search user interfaces

1. Frontend development: Results

Search Results



Leonardo da Vinci (October 2017 Google results)

1. Front-end development

Search interface

Personalized user experience

1. Frontend development: Advanced search (filters)

Filter search results (Multiple boxes)

1. Frontend development: Advanced search (filters)

Filter search results (Multiple boxes)

1. Frontend development: Advanced search (filters)

Filter search results (Multiple boxes)

1. Frontend development: Advanced search (filters)

Filter search results (Multiple boxes)

Why filters?

1. Frontend development: Advanced Search in one-box

Power users

Operators

1. Frontend development: Advanced Search in one-box

Power users: Mnemonics

Bangs (DuckDuckGo)

1. Frontend development: Personalized user experience

Context: Time and location (Internationalization)

Weather (weather.com)

1. Frontend development: Personalized user experience

Past user search queries

User privacy

2. Backend development

  1. Data collection
  2. Data storage
  3. Configuration
  4. Logging
  5. Dashboard
  6. Security

2.1 Backend development: Data collection

Data ownership

Data model (Data and Schema)

2.1 Backend development: Data collection

Data sources

2.1 Backend development: Data collection

Data acquisition

2.1 Backend development: Data collection

2.1 Backend development: Data collection

API
import requests
url = "https://api.github.com/"

response = requests.get(url)
print(response.json())

2.1 Backend development: Data collection

Open Data
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper("http://query.wikidata.org/sparql")
sparql.setQuery("""
SELECT ?item WHERE {
  ?item wdt:P31 wd:Q9143;
}
LIMIT 10
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

for result in results["results"]["bindings"]:
    print(result)

2.1 Backend development: Data collection

Linked Open data cloud

2.1 Backend development: Data collection

Archived and Historical Data

2.1 Backend development: Data collection

Data cleaning and transformation

2.2 Backend development: Data storage

2.2. Backend development: Data Model

2.2 Backend development: Data storage

2.2. Document indices and Query Optimization

Document indices

Database Indexation

Query Optimization

2.2. Backend development: Caching

2.2. Backend development: Replication and Backup

Replication(Master-slave)

2.3. Resource management and configuration

Availability (Wikipedia)

2.3. Resource management and configuration

2.3. Deployment

2.3. Packaging

2.3. Load balancing

2.3. Selective Testing

A/B Testing

2.4. Backend development: Logging

2.4. Logging

Why logs?

2.4. Logging

2.5. Backend development: Dashboard

Wikimedia (Grafana: 5th October 2017)

2.5. Backend development: Dashboard

Wikimedia (Availability: 5th October 2017)

2.5. Backend development: Dashboard

2.5. Backend development: Dashboard

2.6. Backend development: Security

Login (Wikipedia)

2.6. Backend development: Security

OpenID
Mozilla Persona (2011-2016)

2.6. Detecting security vulnerabilities

3. Application programming interface

3. API: Data formats

3. API: (CRUDL) Operations

3. API: Examples

3. API: Examples

GitHub API: Repository Search

3. API: Examples

GitHub API: Pagination

3. API: Examples

3. API: Data dumps

3. Application programming interface

3. Interface definition

3. Human readable Documentation

  1. Read documentation
  2. Develop application to integrate
  3. Add business logic, if any

3. Machine-readable Documentation

  1. Fully autonomous solution to integrate
  2. Add business logic, if any

3. Quality of service

Resource usage limits

3. Quality of service

3. Security

OAuth

4. Towards Federated Search

Federated Search

Project: Federated Search

Virtual Library

Virtual Library

Project: Federated Search

Virtual Library: 4 search engines

Project: Federated Search

Task 1: identify your target audience

Target audience

Project: Federated Search

Task 2: Build your search interface

Project: Federated Search

Task 3: Build your results interface

Project: Federated Search

Task 4: Infrastructure using message queue

References

References

  1. Rahman, Mahmudur. “Search Engines Going beyond Keyword Search: A Survey.” International Journal of Computer Applications, vol. 75, no. 17, Aug. 2013, pp. 1–8. DOI.org (Crossref)
  2. Marchionini, Gary. “Exploratory Search: From Finding to Understanding.” Communications of the ACM, vol. 49, no. 4, Apr. 2006, p. 41.
  3. Flickner, M., et al. “Query by Image and Video Content: The QBIC System.” Computer, vol. 28, no. 9, Sept. 1995, pp. 23–32.

References

References

Image credits