Searching 20,000 Text Files With SQLite FTS5 and Flask

I recently needed to search through a collection of 20,000 text files. After trying basic tools like find and grep, I decided to give sqlite3 a try.

Data Preparation

Using a Jupyter Notebook, I cleaned the documents and imported them into a sqlite3 database. I used Beautiful Soup to extract text from both HTML and plain text documents. The entire cleaning process took around 5 hours. Once I had everything figured out, the actual database import step took about 30 seconds.

Full Text Search with FTS5

I created an FTS5 virtual table to enable fast full-text search with boolean logic and fuzzy matching. This allowed me to quickly find relevant files.

I wrote a second Jupyter Notebook to provide a simple notebook cell based UI for querying and retrieving documents. That worked, but I wanted to make things even more convenient.

Web-Based Search UI with Flask

I built a simple web-based search UI using Flask. To create and debug the Flask server, I used Visual Studio Code with their excellent Flask tutorial.

Lessons Learned

  • Jupyter Notebooks are a useful tool for data cleaning and import.
  • sqlite and FTS5 works well for simple document search applications.
  • Flask is an easy-to-use library for simple web servers.
  • Visual Studio Code is great for creating and debugging Flask servers.

Room for Improvement

  • The UI could be improved by using a design system like Bootstrap.

Overall, I was able to create a functional search tool that can handle a my collection of text files. It took about 10 hours of work, and I learned a few things along the way.

Two thumbs up, would hack again!

Concordia from DeepMind

I tried out the Concordia LLM-driven agent framework.

I ran it on my own macbook, using ollama to serve a variety of LLMs. I used Visual Studio Code to run the Concordia example notebooks. I used venv to manage the Python virtual environment.

To adapt the notebook to run Ollama, I first used ollama pull <model-name> to install each model locally, then I changed

from concordia.language_model import gpt_model

model = gpt_model.GptLanguageModel(api_key=GPT_API_KEY,
                                   model_name=GPT_MODEL_NAME)

to

from concordia.language_model import ollama_model

model = ollama_model.OllamaLanguageModel(model_name=OLLAMA_MODEL_NAME)

I tried gemma2:9b, gemma2:27b, and mixtral:8x7b.

gemma2:9b worked, but had trouble adhering to the prompts.

gemma2:27b is currently broken in Ollama 0.1.48, it doesn’t stay on topic. Supposedly will be fixed in 0.1.50.

Mixtral ran too slowly to be useful.

I didn’t get a good result, but I think if I kept trying with different models I might find a good result.

Overall the concept seems promising. It’s kind of the authors to make their toolkit available!

iA Presenter Micro-Review

I just finished beta testing the iA Presenter Markdown-based presentation software.

tl/dr: It’s really good! But it’s not for me.

What it is: A macOS app that enables you to quickly create great looking slideshow presentations using a slightly enhanced version of Markdown.

Why I liked it:

  • Delightfully easy to use
  • Gorgeous presentation defaults
  • Opinionated tutorial, teaches you to make better presentations

Why it’s not for me:

  • My workplace uses Google Slides.
  • I don’t create presentations except for work.
  • While iA Presenter can be used to create Markdown-based blog posts, it’s priced for professional users, which makes it too expensive for hobby use.

Anyway, I had fun using it, and I wish iA well in their launch and in their future endeavors.

Update June 2024

I went ahead and bought it, to do a work presentation. Expensive, but worth it.

The HD 4chan Browser

I wrote HD, a small SwiftUI app to browse the 4chan image board on an iPhone or iPad.

I’m proud of how nice the app is to use, and how fast it displays images, animations and videos.

More …

CS admissions, Fall 2022

tl/dr advice to kids and parents aiming for admission to a good undergraduate CS program in 2022-2023:

  • Research what your target schools are looking for.
  • Calculate the cost/benefit ratio.
  • Have a back-up plan.

Having just helped my three kids get into undergraduate CS programs at, respectively a top-10, a top-25, and a top-35 school, I want to share my family’s experience.

More …

A Solver for Hitman Go Levels

I wrote a solver for Hitman Go levels. You can use it to:

  • Solve existing levels.
  • Design and test your own levels.
  • Study graph search algorithms like A-Star.

The code is available in four related projects:

Screenshot of Spy Puzzle App

In this screenshot of a simple test level, the “Agent” needs to pick up a red key to open the red door, followed by the blue key to open the blue door. The solver has correctly solved the moves required to solve the level as “east, east, south”.

More …

Calming Ripples App

Calming Ripples is a SwiftUI app that lets you draw animated ripples.

Available for iOS, iPadOS and macOS.

Study the source code to learn these techniques:

  • Handle multi-finger touch events.
  • Draw complex 2D designs using the SwiftUI Canvas view.
  • Animate using the SwiftUI TimelineView view.
  • Use the onChanged() view method to create a dynamically changing animation.

You could use the techniques in this project to create a 2D game.

Screenshot of Calming Ripples App

More …

Tailscale and Tablo

My son’s away at college, without a TV. He wanted to watch the Superbowl. We realized that one way to do that would be for him to access our family’s Tablo DVR remotely. Tablo supports remote access, but there’s a catch: The client software has to be set up while the Tablo device and the client machine are on the same local network.

But my son was 1400 miles away.

This seemed like a good opportunity to experiment with a Tailscale private network. And therein lies a tale.

More …

Animating along a SwiftUI Path

We can use trigonometry and finite differences to animate rigid objects along a SwiftUI path.

The SwiftUI Path class is missing several useful methods for evaluating properties of a path:

  • finding the position (as a CGPoint) of a given fractional position of the path.
  • finding the heading (as an angle) of a given fractional position of the path.
  • finding the total length of the path, measured in points.

Happily, we can write these methods based on the existing trimmedPath method.

With the aid of these methods it’s possible to create animations that move rigid bodies along arbitrary paths.

More …