Efficient storage / search algorithm for date range availability

orokusaki

Background

I'm providing a map-based search for paddle boards, including searching by date - think "check in" / "check out" dates in a hotel search, where only hotels available for that entire date range are included in the results.

I have availability data for a couple years, (updated daily) for each paddle board. A paddle board is either available on a day (for 12 noon pick-up) or not available (meaning somebody is picking it up at 12 noon and will drop it off the next morning).

There are hundreds of thousands of paddle boards, so storing separate records for each day's availability for each is untenable (e.g., 350k paddle boards w/ 2 years availability data results in over 250 million rows).

Technology

I'm using Postgres 15, and I can add extensions, if necessary. I also have Redis available in the stack, in case the best solution involves non-RDBMS.

The app is very read-heavy, so large / expensive indexes are OK, within reason.

Initial Idea

My initial idea for storage and efficient searching involves storing availability data in a char field, for each paddle board, in a format like this: AAANNAAAANA (730 chars long, where A represents "available" and N represents "not available", with a static starting date (e.g., Jan 1, 2023)). Given a search date range (e.g., pick up on Jan 20 / drop off on the morning of Jan 25), I can construct a regexp search (e.g., ^.{19}A{5}) and achieve decent performance... but... I feel like there's something more ideal.

For example, if I was working with a nested tree structure and wanted to get all offspring nodes of a given node, I could use an algorithm, like Modified Preorder Tree Traversal, to make easy work of an otherwise nightmarishly slow problem. Is there no such concept / mathematical solution to the kind of problem I'm describing?

Question

My question is, aside from the regexp solution I've come up with, are there any algorithmic solutions that are widely used to solve problems like this? For example, I looked into segment trees and some other data structures, not completely diving into any particular one, but I couldn't find anything that I deemed to be at least a semi-ideal solution to the problem. I'm looking for a solution that wouldn't involve really low-level hacking; something that's relatively expressive / maintainable, and where the main filtering work takes place in the database (vs in application space).

Bergi

My initial idea for storage and efficient searching involves storing availability data in a char field, for each paddle board, in a format like this: AAANNAAAANA (730 chars long, where A represents "available" and N represents "not available", with a static starting date)

Using a character string for this is rather unwieldy. Nicest to work with, and following the best practice of database normalisation, would be to store the availability data in a separate table - with one entry per availability, per non-availability, or per date for which the status is known, depending on your modeling.

However, for your use case - in particular, never updating individual availabilities but always the whole series, always reading the whole (or at least a large part of the) series, and not ever needing to store additional data per checkout - a non-atomic representation might be ok and likely even faster.

  • One approach would be to use a bit string, which is smaller and has more appropriate operators available than the character string. Afaics, it does not provide suitable indexing capabilities.
  • Next would be using an array of dates, i.e. date[], each representing a non-availability (which I guess are less common than availabilities). This probably takes much more space, but you can build a GIN index to search for arrays containing specific dates or overlapping with specific date arrays.
  • Finally, Postgres 14 introduced multiranges, representing a set of non-overlapping ranges (or, if you want, a range with gaps). Using a datemultirange would be fitting especially if most checkouts of boards are for multiple sequential days, not just a single day. They offer similarly useful operators, and can be indexed to search for containment and overlaps with other (multi)ranges.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

PHP - Check Date Range/Availability in foreach loop

Search if a date in a range of date is include in a range of date

Implement search with date range

TSQL Search date range

Search with date range in laravel

Search Date Range Outlook

search in a date range

Find availability of storage racks in stores between dates range

Could this recursive binary search algorithm be more efficient?

Efficient design to search for objects by multiple parameters with range

Efficient date range overlap calculation in python?

Elastic search date range aggregation

Date range search in laravel controller

Elastic Search Date Range Query

Check Hall Availability using Date range and Time slot

Equal date is not displayed in date range search

I want to perform time range search on date range in azure search

Efficient algorithm to find the count of numbers that are divisible by a number without a remainder in a range

What's the most efficient algorithm to calculate the LCM of a range of numbers?

Please tell me the efficient algorithm of Range Mex Query

Search records by date range & if date range is empty using SQL

Efficient algorithm to search a buffer for any string from a list

date range search with respect to number of chunk size

Aggregation, Date range query in Elassandra/Elastic Search

Elastic Search Interval in Date Range Query

Elastic search query on indices with some date range

Mongodb + Java Drivers. Search by date range

Elastic search date range with match not working

MongoDB date range search without timestamp

TOP Ranking

  1. 1

    Failed to listen on localhost:8000 (reason: Cannot assign requested address)

  2. 2

    How to import an asset in swift using Bundle.main.path() in a react-native native module

  3. 3

    Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

  4. 4

    pump.io port in URL

  5. 5

    Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

  6. 6

    BigQuery - concatenate ignoring NULL

  7. 7

    ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

  8. 8

    Do Idle Snowflake Connections Use Cloud Services Credits?

  9. 9

    maven-jaxb2-plugin cannot generate classes due to two declarations cause a collision in ObjectFactory class

  10. 10

    Compiler error CS0246 (type or namespace not found) on using Ninject in ASP.NET vNext

  11. 11

    Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

  12. 12

    Generate random UUIDv4 with Elm

  13. 13

    Jquery different data trapped from direct mousedown event and simulation via $(this).trigger('mousedown');

  14. 14

    Is it possible to Redo commits removed by GitHub Desktop's Undo on a Mac?

  15. 15

    flutter: dropdown item programmatically unselect problem

  16. 16

    Change dd-mm-yyyy date format of dataframe date column to yyyy-mm-dd

  17. 17

    EXCEL: Find sum of values in one column with criteria from other column

  18. 18

    Pandas - check if dataframe has negative value in any column

  19. 19

    How to use merge windows unallocated space into Ubuntu using GParted?

  20. 20

    Make a B+ Tree concurrent thread safe

  21. 21

    ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

HotTag

Archive