Pre-download mini-benchmark for in-browser (LLM) inference performance

MediaPipe Solution (you are using)

MediaPipe LLM Inference API

Programming language

TBD

Are you willing to contribute it

Yes

Describe the feature and the current behaviour/state

At the moment, for Gen AI use cases in the browser e.g. Gemma 2B with the MediaPipe LLM Inference API, there's no way for a developer to know ahead of time whether the model can actually run on the device within reasonable times. This is an issue because:

For Gen AI, the model download is really large (1.3GB almost for Gemma 2B, which is manyfold the recommended web app size)
Running an inference on devices that have low spec or too much operations already running may be really slow, or even crash a device (on mobile).

This leads to a subpar UX where a user may have waited to download a large model that can't actually run inferences within reasonable times on their device, or that may even crash their device.
What if we ran a mini-benchmark ahead of model download? This is beaufortfrancois@'s idea he suggested for Transformers.js: huggingface/transformers.js#545 (comment).
This would involve running the model code with zeroed-out weights.

Will this change the current API? How?

Yes, as we'd want to expose to developers the output of the mini-benchmark. This output may be abstracted behind a few dev-friendly performance buckets e.g. high, medium, low. Developers could overlay their own logic based on that output.

Who will benefit with this feature?

All developers for on-device/in-browser use cases

Please specify the use cases for this feature

All on-device/in-browser use cases

Any Other info

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pre-download mini-benchmark for in-browser (LLM) inference performance #5468

MediaPipe Solution (you are using)

Programming language

Are you willing to contribute it

Describe the feature and the current behaviour/state

Will this change the current API? How?

Who will benefit with this feature?

Please specify the use cases for this feature

Any Other info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pre-download mini-benchmark for in-browser (LLM) inference performance #5468

Description

MediaPipe Solution (you are using)

Programming language

Are you willing to contribute it

Describe the feature and the current behaviour/state

Will this change the current API? How?

Who will benefit with this feature?

Please specify the use cases for this feature

Any Other info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions