I'm providing a map-based search for paddle boards, including searching by date - think "check in" / "check out" dates in a hotel search, where only hotels available for that entire date range are included in the results.
I have availability data for a couple years, (updated daily) for each paddle board. A paddle board is either available on a day (for 12 noon pick-up) or not available (meaning somebody is picking it up at 12 noon and will drop it off the next morning).
There are hundreds of thousands of paddle boards, so storing separate records for each day's availability for each is untenable (e.g., 350k paddle boards w/ 2 years availability data results in over 250 million rows).
I'm using Postgres 15, and I can add extensions, if necessary. I also have Redis available in the stack, in case the best solution involves non-RDBMS.
The app is very read-heavy, so large / expensive indexes are OK, within reason.
My initial idea for storage and efficient searching involves storing availability data in a char field, for each paddle board, in a format like this: AAANNAAAANA
(730 chars long, where A
represents "available" and N
represents "not available", with a static starting date (e.g., Jan 1, 2023)). Given a search date range (e.g., pick up on Jan 20 / drop off on the morning of Jan 25), I can construct a regexp search (e.g., ^.{19}A{5}
) and achieve decent performance... but... I feel like there's something more ideal.
For example, if I was working with a nested tree structure and wanted to get all offspring nodes of a given node, I could use an algorithm, like Modified Preorder Tree Traversal, to make easy work of an otherwise nightmarishly slow problem. Is there no such concept / mathematical solution to the kind of problem I'm describing?
My question is, aside from the regexp solution I've come up with, are there any algorithmic solutions that are widely used to solve problems like this? For example, I looked into segment trees and some other data structures, not completely diving into any particular one, but I couldn't find anything that I deemed to be at least a semi-ideal solution to the problem. I'm looking for a solution that wouldn't involve really low-level hacking; something that's relatively expressive / maintainable, and where the main filtering work takes place in the database (vs in application space).
My initial idea for storage and efficient searching involves storing availability data in a char field, for each paddle board, in a format like this:
AAANNAAAANA
(730 chars long, whereA
represents "available" andN
represents "not available", with a static starting date)
Using a character string for this is rather unwieldy. Nicest to work with, and following the best practice of database normalisation, would be to store the availability data in a separate table - with one entry per availability, per non-availability, or per date for which the status is known, depending on your modeling.
However, for your use case - in particular, never updating individual availabilities but always the whole series, always reading the whole (or at least a large part of the) series, and not ever needing to store additional data per checkout - a non-atomic representation might be ok and likely even faster.
date[]
, each representing a non-availability (which I guess are less common than availabilities). This probably takes much more space, but you can build a GIN index to search for arrays containing specific dates or overlapping with specific date arrays.datemultirange
would be fitting especially if most checkouts of boards are for multiple sequential days, not just a single day. They offer similarly useful operators, and can be indexed to search for containment and overlaps with other (multi)ranges.Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments