You want to find your "tax" related things. Each time you create something, you put it in the "tax" directory. The same with photos: you organize them once in the input stage or periodically, and then you can search for them easily later on.
For text, if the content of the file is what is relevant, grep/ripgrep are your friends.
File type/name, find/fd are your friends.
Date of creation? A combination of ls and find.
For image, video and sound it is more difficult (impossible?) to search by content.
File names and directories are simple, and it works as well as you make it work; it does not matter the type or content of the file.
Hey! Thanks for taking the time to comment.
I don't think it's that much of a magic with modern multimodal embedding models that are available out there.
As you mentioned:
> /…/ The same with photos: you organize them once in the input stage or periodically, and then you can search for them easily later on. /…/
As a hobby photographer, I take lots of photos. For example, I know I've taken photos of my cats, tractors, bridges, forests, etc., but I never bother manually tagging them beyond basic editing (contrast, white balance, etc.).
A system should be able to recognize what's in these photos and allow me to search for them not only by their content but by vibe as well. And once I find a photo I like, I'd really like to see similar photos (this in particular is very helpful for photographers curating their exhibitions). This is possible to achieve these days.
Also, I fully understand your point of view on `find`, `fd`, `grep`, `cat`, etc., but in reality it's only us nerds who ever open a terminal.
Something that comes to mind as a "problem" is that popular DBs are not designed to manage this (at least I do not think so), so you can have a DB and violate the principle of only appending, and the DB will let you.
And how difficult is it to migrate to this model or away? Although this is the same "problem" with any model, I suppose.
> Notice that in all cases time spent sleeping or otherwise waiting shouldn’t be counted, though you probably shouldn’t do that in your code in the first place.
I thought the time your CPU waits around doing nothing is something you want to measure.
If your algorithm has better "mathematical" scaling with big N but is worse in real world scenarios, it "failed".
Why are all images just wrong? Text superimposed with shapes, random shapes in random places, and a clock with the 12 in the center instead of in the 12 position.
You want to find your "tax" related things. Each time you create something, you put it in the "tax" directory. The same with photos: you organize them once in the input stage or periodically, and then you can search for them easily later on.
For text, if the content of the file is what is relevant, grep/ripgrep are your friends.
File type/name, find/fd are your friends.
Date of creation? A combination of ls and find.
For image, video and sound it is more difficult (impossible?) to search by content.
File names and directories are simple, and it works as well as you make it work; it does not matter the type or content of the file.