• SpicyToaster420@sopuli.xyz
    link
    fedilink
    arrow-up
    4
    ·
    4 days ago

    Awesome use of LLMs. I wonder they didn’t use FP8 quantization though, especially since their target hardware was an L40s.

    • bitfucker
      link
      fedilink
      arrow-up
      5
      ·
      4 days ago

      In the context of machine learning, usually a list of numbers that is arranged in a certain way, and is used for mathematical operation. You can think of it as a transfer/transform “function” that takes data as input, and spits out the representation of said data in some other way (that we usually don’t know until the training is finished and we analyze the result)

        • bitfucker
          link
          fedilink
          arrow-up
          3
          ·
          3 days ago

          No. Normally the kernel doesn’t get updated in the network during training. They are called hyper-parameters. They do affect training, but they are not updated by the training algorithm

OSZAR »