Skip to content

Conversation

@shaik-zeeshan
Copy link
Contributor

related: #1271
/claim #1271

Screen.Recording.2025-02-11.at.3.50.14.PM.1.mp4
Screen.Recording.2025-02-11.at.4.13.59.PM.1.mp4

@vercel
Copy link

vercel bot commented Feb 11, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
screenpipe ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 11, 2025 10:55am


// Add conditions for each keyword
if fuzzy_match {
conditions.extend(keywords.iter().map(|_| "o.text LIKE '%' || ? || '%'"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the FTS 100000x faster than LIKE?

O(log(n))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah you are right

changing the implementation to use FTS

@louis030195
Copy link
Collaborator

trying to understand

is it full refactor of timeline or different page in it?

like ideas here, however the query is way too slow to generate suggestion, most people don't have this attention span

how do you extract these logos? this is quite interesting as some people ask for "universal file search" or like typical use case:

"ok im trying to find my rent document is it in dropbox, google drive, gmail, icloud, somewhere else...?"

and also this is a use case for teams, for example if my friend saw a document and i want to find it, it would be so useful to be able to search and find it in seconds

#1293 related

this is kinda out of scope of the initial search idea i wrote but kind of extension of it

we've also explored idea of using models like https://github.com/microsoft/OmniParser to index images/logos more precisely than OCR for the future FYI

PS: cannot test right now - focused on #1318

@shaik-zeeshan
Copy link
Contributor Author

shaik-zeeshan commented Feb 12, 2025

is it full refactor of timeline or different page in it?

it is different page

however the query is way too slow to generate suggestion, most people don't have this attention span

using local model for suggestions, maybe I could generate suggestions based on app_name without using models.
so it doesn't too long (app name based filtering). also since this is keyword search, the generated suggestion might not have any data

how do you extract these logos? this is quite interesting as some people ask for "universal file search" or like typical use case:

getting links from text and fetching favicon from google

"ok im trying to find my rent document is it in dropbox, google drive, gmail, icloud, somewhere else...?"

and also this is a use case for teams, for example if my friend saw a document and i want to find it, it would be so useful to be able to search and find it in seconds

this is cool, maybe we could use models to get related info or use embeddings for this

@vercel
Copy link

vercel bot commented Feb 12, 2025

Someone is attempting to deploy a commit to the louis030195's projects Team on Vercel.

A member of the Team first needs to authorize it.

@louis030195
Copy link
Collaborator

hows the AI settings config?

Screenshot 2025-02-13 at 8 38 39 AM Screenshot 2025-02-13 at 8 38 23 AM

@louis030195
Copy link
Collaborator

Screen.Recording.2025-02-13.at.8.41.04.AM.mov

also would appreciate if u can fix the top bar navigation

@shaik-zeeshan
Copy link
Contributor Author

Screenshot 2025-02-13 at 8 38 23 AM

looking into this

also would appreciate if u can fix the top bar navigation

sure

@louis030195
Copy link
Collaborator

just tested finally

  1. can we scroll the same order of the timeline? eg things are from right to left, scrolling down goes left (in the past)
Screenshot 2025-02-15 at 11 51 03 AM
  1. is there a way to highlight on the image in yellow the text, like rewind does? and make it selectable? (how hard would it be? could be another PR)

  2. sometimes frames showed up even though word "matt" was not on the frame, is that a backend issue or?

  3. did not really understand the query generation, also is there a way to see which model is used like robot icon i did in other ui?

  4. maybe add icon to go back to timeline? back arrow or something else

  5. did not understand clearly the date input at first on search page

  6. did not see this feature after a while

image
  1. am a bit concerned by this:
Screenshot 2025-02-15 at 11 57 41 AM

do we have an issue in backend where the frame is desync with the content/ocr/etc.?

i think this feature is not great rn, not sure how to make it better but either hide it for now or figure out something that make sense, how often would people want to get description of the image? maybe can be a way to extract text better than OCR using multimodal? or idk

other issues i encountered:

Screenshot 2025-02-15 at 11 55 50 AM Screenshot 2025-02-15 at 11 55 38 AM Screenshot 2025-02-15 at 11 55 31 AM Screenshot 2025-02-15 at 11 55 24 AM Screenshot 2025-02-15 at 11 55 08 AM Screenshot 2025-02-15 at 11 54 55 AM

i think the suggestion query is kinda awkward for now, maybe we could tune down the scope of this PR to just simple full text search over OCR/transcriptions or iterate on this UX further as you prefer

overall this is really great PR already, just need some adjustment or tuning down the scope as you prefer, happy to increase the bounty or add tip!

@shaik-zeeshan
Copy link
Contributor Author

shaik-zeeshan commented Feb 16, 2025

  1. can we scroll the same order of the timeline? eg things are from right to left, scrolling down goes left (in the past)

sure

  1. is there a way to highlight on the image in yellow the text, like rewind does? and make it selectable? (how hard would it be? could be another PR)

I already did this in code but comment it out because it wasn't accurate

Screen.Recording.2025-02-16.at.3.32.54.PM.1.mp4

3. sometimes frames showed up even though word "matt" was not on the frame, is that a backend issue or?

yeah sometimes frame doesn't match content and app_name and window_name

5. maybe add icon to go back to timeline? back arrow or something else

I will add this

6. did not understand clearly the date input at first on search page

will make it simpler and make placeholder be something understandable

8. am a bit concerned by this:

Screenshot 2025-02-15 at 11 57 41 AM i think this feature is not great rn, not sure how to make it better but either hide it for now or figure out something that make sense, how often would people want to get description of the image? maybe can be a way to extract text better than OCR using multimodal? or idk

we could remove this since the user can see the frame. so it doesn't really feel useful . I only did this because I wanted to show links to pages in frame. currently we don't store any links. showing links to the page present in frame will be a useful feature

Screenshot 2025-02-15 at 11 55 50 AM

these is happening with ffmpeg, not able to get frames if the video is not finished

i think the suggestion query is kinda awkward for now, maybe we could tune down the scope of this PR to just simple full text search over OCR/transcriptions or iterate on this UX further as you prefer

yeah I thought this too ,creating suggestions for keywords is taking too long and sometimes the suggestion query doesn't return any records.

@shaik-zeeshan
Copy link
Contributor Author

Screen.Recording.2025-02-17.at.9.05.06.PM.1.mp4

removed query suggestion, added app based filtering, improve ux while searching, add keyword highlight (not sure if we should keep it)

@louis030195
Copy link
Collaborator

nice!

i guess just remove the yellow highlight feature, not great

regarding links i agree it would be useful, you mean in the OCR? is there a way to extract links without AI? eg regex

i think it would be cool if in the backend we store the URL the user is on while on the browser but we don't have this yet

@shaik-zeeshan
Copy link
Contributor Author

i guess just remove the yellow highlight feature, not great

okay

regarding links i agree it would be useful, you mean in the OCR? is there a way to extract links without AI? eg regex

some browser don't show complete url (like arc, safari)

i think it would be cool if in the backend we store the URL the user is on while on the browser but we don't have this yet

waiting for this pr #1381

@louis030195
Copy link
Collaborator

i guess just remove the yellow highlight feature, not great

okay

regarding links i agree it would be useful, you mean in the OCR? is there a way to extract links without AI? eg regex

some browser don't show complete url (like arc, safari)

i think it would be cool if in the backend we store the URL the user is on while on the browser but we don't have this yet

waiting for this pr #1381

i'm pushing soon code to get the browser url data in api result if useful

@louis030195
Copy link
Collaborator

done

@louis030195
Copy link
Collaborator

https://cap.so/s/djh6h7dwmtddw5m

for some reason


2025-02-20T16:30:19.143234Z ERROR screenpipe_server::server: Failed to extract frame 92996: ffmpeg process failed: ffmpeg version 7.1 Copyright (c) 2000-2024 the FFmpeg developers
  built with Apple clang version 16.0.0 (clang-1600.0.26.4)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/7.1_4 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay
--enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-li
brav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-
libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype -
-enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-lib
jack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox --enable-neon
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.100 / 61. 19.100
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
  libpostproc    58.  3.100 / 58.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/Users/louisbeaumont/.screenpipe/data/monitor_1_2025-02-10_15-56-21.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2mp41
    encoder         : Lavf61.7.100
  Duration: 00:01:00.00, start: 0.000000, bitrate: 848 kb/s
  Stream #0:0[0x1](und): Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv, progressive), 3024x1964, 848 kb/s, 0.50 fps, 0.50 tbr, 16384 tbn (default)
      Metadata:
        handler_name    : VideoHandler
        vendor_id       : [0][0][0][0]
        encoder         : Lavc61.19.100 libx265
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> mjpeg (native))
Press [q] to stop, [?] for help
[vf#0:0 @ 0x124e045f0] No filtered frames for output stream, trying to initialize anyway.
[mjpeg @ 0x136506ff0] Non full-range YUV is non-standard, set strict_std_compliance to at most unofficial to use it.
[mjpeg @ 0x134f056f0] ff_frame_thread_encoder_init failed
[vost#0:0/mjpeg @ 0x134f05420] Error while opening encoder - maybe incorrect parameters such as bit_rate, rate, width or height.
[vf#0:0 @ 0x600001d00140] Task finished with error code: -22 (Invalid argument)
[vf#0:0 @ 0x600001d00140] Terminating thread with return code -22 (Invalid argument)
[vost#0:0/mjpeg @ 0x134f05420] Could not open encoder before EOF
[vost#0:0/mjpeg @ 0x134f05420] Task finished with error code: -22 (Invalid argument)
[vost#0:0/mjpeg @ 0x134f05420] Terminating thread with return code -22 (Invalid argument)
[out#0/image2 @ 0x600001910000] Nothing was written into output file, because at least one of its streams received no packets.

@louis030195
Copy link
Collaborator

louis030195 commented Feb 20, 2025

did you base your branch on this? #1364

we use png in prod, not sure if related

@shaik-zeeshan
Copy link
Contributor Author

https://cap.so/s/djh6h7dwmtddw5m

solving the issues mentioned in video

the ffmpeg error occures when the video is being record or when video is not saved properly when server is closed

@louis030195
Copy link
Collaborator

Screenshot 2025-02-20 at 12 47 11 PM

why are frames failing?

@louis030195
Copy link
Collaborator

these are old ones not video being wrriten - also ideally should show some skeleton or thing to show its loading

i guess for recent frames we should prob leverage the streaming api but its not ready yet too experiemntal

@louis030195
Copy link
Collaborator

Screenshot 2025-02-20 at 12 48 59 PM

@shaik-zeeshan
Copy link
Contributor Author

fixed ffmpeg error

these are old ones not video being wrriten - also ideally should show some skeleton or thing to show its loading

sure , will add this

@shaik-zeeshan
Copy link
Contributor Author

added loading spinner to images

Screen.Recording.2025-02-21.at.11.10.18.AM.1.mp4

@louis030195
Copy link
Collaborator

/approve

@algora-pbc
Copy link

algora-pbc bot commented Feb 21, 2025

@louis030195: The claim has been successfully added to reward-all. You can visit your dashboard to complete the payment.

@louis030195
Copy link
Collaborator

/tip $100 @shaik-zeeshan

amazing!

@louis030195 louis030195 merged commit 67f624b into mediar-ai:main Feb 21, 2025
2 of 6 checks passed
@algora-pbc
Copy link

algora-pbc bot commented Feb 21, 2025

🎉🎈 @shaik-zeeshan has been awarded $100! 🎈🎊

@neo773
Copy link
Contributor

neo773 commented Feb 21, 2025

added loading spinner to images

Screen.Recording.2025-02-21.at.11.10.18.AM.1.mp4

That layout shift at the bottom is kinda annoying, I guess we could render a full row of skeleton images if user is typing?

@shaik-zeeshan
Copy link
Contributor Author

shaik-zeeshan commented Feb 21, 2025

That layout shift at the bottom is kinda annoying, I guess we could render a full row of skeleton images if user is typing?

Sure. We could do this while data is loading.
Will create another pr for this

@shaik-zeeshan shaik-zeeshan deleted the keyword-rewind-search branch February 24, 2025 03:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants