Skip to content

Conversation

dixyes
Copy link
Contributor

@dixyes dixyes commented Apr 16, 2024

MUSA is a CUDA-like SDK on moorethreads platform, like HIP/ROCm: https://developer.mthreads.com/musa/musa-sdk

Yet only supports makefile With a simple dirty cmake implemention

Use MUSA_ARCH=21 for S80

Original musa /usr/local/musa/include/internal/mublas-types.h will mess up gcc 12 c++ compiling, needs modifiy:

@@ -32,8 +32,8 @@
    Hence, only define __noinline__ when the code is being processed
    by a  MUSA compiler component.
 */   
-#define __noinline__ \
-        __attribute__((noinline))
+//#define __noinline__ \
+//        __attribute__((noinline))
 #endif /* __MUSACC__  || __MUSA_ARCH__ || __MUSA_LIBDEVICE__ */
         
 #define __forceinline__ \

Copy link
Contributor

github-actions bot commented Apr 16, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 423 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=11212.29ms p(95)=28275.6ms fails=, finish reason: stop=367 truncated=56
  • Prompt processing (pp): avg=125.18tk/s p(95)=555.78tk/s
  • Token generation (tg): avg=23.21tk/s p(95)=35.11tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=musa commit=ec3cc36dc835572d4e17cea727d831062da499bc

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 423 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1713256307 --> 1713256943
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 639.79, 639.79, 639.79, 639.79, 639.79, 489.6, 489.6, 489.6, 489.6, 489.6, 503.36, 503.36, 503.36, 503.36, 503.36, 534.42, 534.42, 534.42, 534.42, 534.42, 575.08, 575.08, 575.08, 575.08, 575.08, 583.95, 583.95, 583.95, 583.95, 583.95, 584.38, 584.38, 584.38, 584.38, 584.38, 586.33, 586.33, 586.33, 586.33, 586.33, 615.47, 615.47, 615.47, 615.47, 615.47, 615.18, 615.18, 615.18, 615.18, 615.18, 628.41, 628.41, 628.41, 628.41, 628.41, 629.65, 629.65, 629.65, 629.65, 629.65, 646.43, 646.43, 646.43, 646.43, 646.43, 654.95, 654.95, 654.95, 654.95, 654.95, 659.85, 659.85, 659.85, 659.85, 659.85, 668.46, 668.46, 668.46, 668.46, 668.46, 618.62, 618.62, 618.62, 618.62, 618.62, 596.29, 596.29, 596.29, 596.29, 596.29, 600.57, 600.57, 600.57, 600.57, 600.57, 600.99, 600.99, 600.99, 600.99, 600.99, 600.66, 600.66, 600.66, 600.66, 600.66, 604.06, 604.06, 604.06, 604.06, 604.06, 606.0, 606.0, 606.0, 606.0, 606.0, 606.1, 606.1, 606.1, 606.1, 606.1, 605.75, 605.75, 605.75, 605.75, 605.75, 610.9, 610.9, 610.9, 610.9, 610.9, 611.43, 611.43, 611.43, 611.43, 611.43, 610.52, 610.52, 610.52, 610.52, 610.52, 613.81, 613.81, 613.81, 613.81, 613.81, 612.87, 612.87, 612.87, 612.87, 612.87, 615.74, 615.74, 615.74, 615.74, 615.74, 617.77, 617.77, 617.77, 617.77, 617.77, 626.94, 626.94, 626.94, 626.94, 626.94, 625.97, 625.97, 625.97, 625.97, 625.97, 622.99, 622.99, 622.99, 622.99, 622.99, 623.74, 623.74, 623.74, 623.74, 623.74, 627.44, 627.44, 627.44, 627.44, 627.44, 629.51, 629.51, 629.51, 629.51, 629.51, 629.15, 629.15, 629.15, 629.15, 629.15, 630.17, 630.17, 630.17, 630.17, 630.17, 632.82, 632.82, 632.82, 632.82, 632.82, 640.21, 640.21, 640.21, 640.21, 640.21, 646.72, 646.72, 646.72, 646.72, 646.72, 648.14, 648.14, 648.14, 648.14, 648.14, 618.95, 618.95, 618.95, 618.95, 618.95, 619.17, 619.17, 619.17, 619.17, 619.17, 619.71, 619.71, 619.71, 619.71, 619.71, 621.49, 621.49, 621.49, 621.49, 621.49, 624.89, 624.89, 624.89, 624.89, 624.89, 632.82, 632.82, 632.82, 632.82, 632.82, 604.21, 604.21, 604.21, 604.21, 604.21, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.84, 603.44, 603.44, 603.44, 603.44, 603.44, 601.65, 601.65, 601.65, 601.65, 601.65, 598.88, 598.88, 598.88, 598.88, 598.88, 598.64, 598.64, 598.64, 598.64, 598.64, 600.76, 600.76, 600.76, 600.76, 600.76, 606.13, 606.13, 606.13, 606.13, 606.13, 606.12, 606.12, 606.12, 606.12]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 423 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1713256307 --> 1713256943
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 33.66, 33.66, 33.66, 33.66, 33.66, 31.0, 31.0, 31.0, 31.0, 31.0, 22.8, 22.8, 22.8, 22.8, 22.8, 23.56, 23.56, 23.56, 23.56, 23.56, 23.94, 23.94, 23.94, 23.94, 23.94, 23.68, 23.68, 23.68, 23.68, 23.68, 23.99, 23.99, 23.99, 23.99, 23.99, 24.36, 24.36, 24.36, 24.36, 24.36, 25.0, 25.0, 25.0, 25.0, 25.0, 25.12, 25.12, 25.12, 25.12, 25.12, 25.22, 25.22, 25.22, 25.22, 25.22, 25.16, 25.16, 25.16, 25.16, 25.16, 24.64, 24.64, 24.64, 24.64, 24.64, 24.5, 24.5, 24.5, 24.5, 24.5, 23.75, 23.75, 23.75, 23.75, 23.75, 23.68, 23.68, 23.68, 23.68, 23.68, 23.52, 23.52, 23.52, 23.52, 23.52, 22.83, 22.83, 22.83, 22.83, 22.83, 22.95, 22.95, 22.95, 22.95, 22.95, 23.03, 23.03, 23.03, 23.03, 23.03, 23.06, 23.06, 23.06, 23.06, 23.06, 23.01, 23.01, 23.01, 23.01, 23.01, 22.77, 22.77, 22.77, 22.77, 22.77, 22.58, 22.58, 22.58, 22.58, 22.58, 22.26, 22.26, 22.26, 22.26, 22.26, 22.07, 22.07, 22.07, 22.07, 22.07, 21.88, 21.88, 21.88, 21.88, 21.88, 21.88, 21.88, 21.88, 21.88, 21.88, 22.05, 22.05, 22.05, 22.05, 22.05, 22.09, 22.09, 22.09, 22.09, 22.09, 22.17, 22.17, 22.17, 22.17, 22.17, 22.28, 22.28, 22.28, 22.28, 22.28, 22.31, 22.31, 22.31, 22.31, 22.31, 22.02, 22.02, 22.02, 22.02, 22.02, 22.03, 22.03, 22.03, 22.03, 22.03, 22.03, 22.03, 22.03, 22.03, 22.03, 22.19, 22.19, 22.19, 22.19, 22.19, 22.26, 22.26, 22.26, 22.26, 22.26, 22.33, 22.33, 22.33, 22.33, 22.33, 22.39, 22.39, 22.39, 22.39, 22.39, 22.4, 22.4, 22.4, 22.4, 22.4, 22.42, 22.42, 22.42, 22.42, 22.42, 22.39, 22.39, 22.39, 22.39, 22.39, 22.35, 22.35, 22.35, 22.35, 22.35, 22.33, 22.33, 22.33, 22.33, 22.33, 22.23, 22.23, 22.23, 22.23, 22.23, 22.33, 22.33, 22.33, 22.33, 22.33, 22.41, 22.41, 22.41, 22.41, 22.41, 22.53, 22.53, 22.53, 22.53, 22.53, 22.75, 22.75, 22.75, 22.75, 22.75, 22.75, 22.75, 22.75, 22.75, 22.75, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.55, 22.4, 22.4, 22.4, 22.4, 22.4, 22.33, 22.33, 22.33, 22.33, 22.33, 20.72, 20.72, 20.72, 20.72, 20.72, 20.53, 20.53, 20.53, 20.53, 20.53, 20.54, 20.54, 20.54, 20.54, 20.54, 20.56, 20.56, 20.56, 20.56, 20.56, 20.58, 20.58, 20.58, 20.58]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 423 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1713256307 --> 1713256943
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.25, 0.25, 0.25, 0.25, 0.25, 0.29, 0.29, 0.29, 0.29, 0.29, 0.17, 0.17, 0.17, 0.17, 0.17, 0.24, 0.24, 0.24, 0.24, 0.24, 0.22, 0.22, 0.22, 0.22, 0.22, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.27, 0.27, 0.27, 0.27, 0.27, 0.18, 0.18, 0.18, 0.18, 0.18, 0.35, 0.35, 0.35, 0.35, 0.35, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.18, 0.18, 0.18, 0.18, 0.18, 0.13, 0.13, 0.13, 0.13, 0.13, 0.23, 0.23, 0.23, 0.23, 0.23, 0.29, 0.29, 0.29, 0.29, 0.29, 0.28, 0.28, 0.28, 0.28, 0.28, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.29, 0.29, 0.29, 0.29, 0.29, 0.11, 0.11, 0.11, 0.11, 0.11, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.29, 0.29, 0.29, 0.29, 0.29, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.2, 0.2, 0.2, 0.2, 0.2, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.28, 0.28, 0.28, 0.28, 0.28, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.1, 0.1, 0.1, 0.1, 0.1, 0.24, 0.24, 0.24, 0.24, 0.24, 0.46, 0.46, 0.46, 0.46, 0.46, 0.61, 0.61, 0.61, 0.61, 0.61, 0.65, 0.65, 0.65, 0.65, 0.65, 0.7, 0.7, 0.7, 0.7, 0.7, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.39, 0.39, 0.39, 0.39, 0.39, 0.09, 0.09, 0.09, 0.09, 0.09, 0.14, 0.14, 0.14, 0.14, 0.14, 0.22, 0.22, 0.22, 0.22, 0.22, 0.23, 0.23, 0.23, 0.23]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 423 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1713256307 --> 1713256943
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0]
                    
Loading

@dixyes dixyes marked this pull request as ready for review April 18, 2024 02:31
@mofosyne mofosyne added Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level enhancement New feature or request labels May 9, 2024
@dixyes dixyes closed this Sep 3, 2024
@dixyes
Copy link
Contributor Author

dixyes commented Sep 3, 2024

link: #8383

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants