Q&A with Loka - Overcoming the top 5 challenges to delivering generative AI

We’ve gathered the questions you asked during our webinar and included responses from our ML team at Loka. You can also access our webinar recording and slides anytime using 👉🏽 this link

Questions from Audience	Answers from Loka
by RLHF you mean reinforcement learning with human feedback? or something else?	Yes, reinforcement learning from human feedback.
Do you find customers wanting to move straight to production with their POC instead of waiting for their mvp? What kind of POC can you build to make this happen?	Most customers choose to take the intermediate step of having a functioning GenAI application without it being fully production-ready so they can test with users and get feedback. However, you can certainly go straight to a production implementation. An MVP often primarily focuses on scaling out the model from the POC scope, potentially creating an interface, or integrating the model with an existing application or workflow. A production implementation still needs to do this work, but also adds considerations like ongoing operations, automated tests and deployment, security hardening, etc.
Getting Data from external system either using RAG or another Vector DB? How to decide?	It seems there might be some confusion. RAG (Retrieval-Augmented Generation) is not a Vector DB. RAG is an architecture for LLM (Large Language Model) systems, which often (but not always) relies on a Vector DB.
Can you repeat that quote about POC should be all about evaluating whats possible and production should be....?	Yes, the POC is about figuring out if it's possible, and production is about the best way to do it.To elaborate further, for the POC (proof of concept), we're interested in iterating fast and understanding whether the use case is feasible at all. Here, Bedrock is very useful because it allows us to build quickly without worrying about deploying the model, hosting, and all of that. For production, we take into account other variables like cost, latency, and other aspects. This is where it can make sense to develop a model that's optimized for your use case and hosted in SageMaker.
How do i choose LLM cloud based or locally managed. Do u have checklist, especially when using Model which can be self managed	Generally, the criteria would be team skills/capacity and cost. A couple of notes to consider:
(1) Is your team capable of managing a deployed LLM? Consider uptime and optimizations. Does your team have the skills to do this? Does your team have the bandwidth to maintain this deployment?
(2) Consider traffic patterns and necessary latency. A self-hosted LLM can have lower latency but can also be much more expensive, particularly if you have inconsistent traffic patterns. On the flip side, if you have the skills and adequate traffic patterns, you can optimize your system to lower costs.
How to plan for cost effective Gen AI solution based on known practices ?	Cost is driven by 4 factors:
(1) Can you use an existing model, or do you need to fine-tune/train a model?
(2) How much text or other content is required as an input and output from the model?
(3) Do you need to store and retrieve context (e.g., RAG)?
(4) What size of model do you need? Can you get good enough performance from a smaller LLM?
Your use case will determine the answers to these questions. The more complex workloads, especially with fine-tuning, can get very costly, so it's important to choose a use case that has a measurable financial impact on your company (e.g., attracts or retains users, improves internal operations/efficiency, contributes to the core IP of the company, etc.).
Do we require multiple model or can be merged to one model based on usecase, or routing will be required ?	We're not sure we fully understand the question, but the answer will always depend on your use case and solution. Assuming you have multiple tasks, we recommend using multiple models to lower costs. You'd use the simplest model that still performs up to your standards for each task. However, if you want to cut development time, reduce development costs, get to market faster, or for any other reason, you can use a more powerful model for all tasks.
How to monitor Model and other GenAI artifacts for operational as well as for better $ value outcome	We interpret this question as being very broad, so we'll offer some general suggestions. First, ensure you monitor your system as you would any other ML project; LLMOps is very relevant here. Regularly measure performance metrics and, if possible, gather feedback from users. We also suggest logging as much as you can (particularly failure cases) to validate where the system failed and expand your testing dataset. Ensure you log intermediate steps in addition to prompts, context, and outputs. Also, make sure that the specific models used are part of your logs to ensure repeatability.On an infrastructure level, apply usual best practices: monitor calls and traffic patterns, resource usage, uptime, and all other common metrics.
How do we define SLA for promt based Gen AI or other Gen AI scoped use cases?	Depends on the use case, but the most common way to enforce SLAs is to ensure the output of the LLM follows a structured output. There are many techniques (particularly prompt engineering ones) to make the LLM output easily parsable. For latency, it shouldn't be too different from traditional endpoints; just make sure to account for context size and task complexity when defining the SLA latencies. In terms of quality, ensuring a specific level of quality is much harder and varies widely based on the use case, so I don't have any generic insights to give.
Hi, Have your team leveraged LangChain ecosystem for any usecase? If so please explain a bit , Thanks	Yes, our team regularly leverages LangChain. We also use other common libraries like LlamaIndex, LangGraph, Haystack, and others. Despite these tools still being quite young and changing very frequently, we find that using common tools like these helps with faster development, ensures best practices are followed, and provides access to the latest techniques that are becoming commonplace.
One of the key challenges for enterprise is to decide when to host//build internally, rely on suppliers/partners or to use manages services and off-the-shelf solutions. Is there any guidance on how we can approach that decision?	Is this feature/system/product central to your product/business, or does it add significant value? If the answer to any of these is no, I'd argue you should steer clear of building internally. If it's core, consider whether you have the team for internal development and are willing to invest in development. Developing an internal solution can drive huge benefits but is also expensive and a long-term commitment, so be sure you are aware of this. Ensure your team has the bandwidth and skills for it, and that you are willing to support them through development. Finally, consider the use case specifically. Is it a well-understood/common use case? If yes, maybe you are better off with an off-the-shelf solution. Is there a possibility to continuously improve the solution and have it become a significant differentiating factor? If so, developing internally would be my advice, but answering this question can be surprisingly tricky and require some experience in the area.
What about determining which LLM would be the best fit for my data and what I want to be able to do? What steps do you see that should be done to determine this? What key aspects of a LLM should I be aware of (i.e. Model Architecture, Parameter Count, Pre-training, fine tuning capability, Task performance, Inference speed, cost of use)?	A good first step would generally be checking task performance on public leaderboards, either for the specific task you want it to perform, a similar task available on said leaderboards, or even just generic task leaderboards. However good these insights might be, the best way is to actually give them a short test run for your specific task. Generally, you should start with smaller models (the cheapest ones) and move up (more expensive models, generally with more parameters) as you find that they aren't performing up to your needs. An important consideration is whether you want to use open or closed-source models; this often comes down to a matter of personal preference, cost, and performance. Finally, consider whether you'll be looking into fine-tuning your model. If this is something you know you'll need, ensure the model you choose can be relatively easy to fine-tune.
Can you share success stories of how Loka has helped clients migrate complex big data workloads or optimize AI/ML workloads?	You can find out latest case studies at loka.com/work. If you want more of a deep dive on our migration success stories, please reach out at loka.com/contact and mention "big data and AI/ML migrations".
How do you measure/monitor hallucinations?	Measuring hallucinations directly is extremely difficult in most cases, so we don't directly measure them. Generally, you use other metrics that can indicate a larger or smaller presence of hallucinations. For example, measuring faithfulness in RAG systems is a good way to validate whether your system is basing its responses on the context or internal knowledge. Another useful technique is to add self-check steps, which lower hallucinations by themselves, and can provide a more qualitative measure of how your base LLM performs.
What do you charge for the POC, baseline?	The cost for our POC services varies based on specific needs. To get a detailed quote and learn more about our GenAI POC offer, visit loka.com/genaiworkshop.
If you are in the middle of a major move to AWS how do I best take advantage of the new AWS AI services	Key services to explore include Amazon Q, Amazon Bedrock, AWS Copilot, and AWS SageMaker. Each of these services has its own strengths and applications. Working with your AWS account team or a partner like Loka can help you identify which of these services will provide the most value for your company's specific use cases. For personalized guidance, consider reaching out to Loka at loka.com/contact.
could we have an idea of the cost involved in developing an ai project? Foe one of the example you talked about? just a ballpark idea	While specific costs vary, AI projects generally involve expenses for data collection, model development, and deployment. For a mid-sized project like the examples we discussed, you'll need to invest in data infrastructure and model training. Working with Loka can help streamline this process. For a detailed estimate, feel free to review our listing on AWS Marketplace.
Do you have any sample or reference projects leveraging some these AWS services?	All of the customers that Loka referenced during the webinar are leveraging AWS services, such as Amazon Bedrock, Amazon SageMaker, and a broad spectrum of AWS solutions. You can find more success stories at loka.com/work.
I would like to learn more about AI and AWS	Great! You can learn about AI and AWS at loka.com/aws/genai and aws.com.

<aside> 🌎 More questions about your use case or Loka’s offerings? Visit our listing on AWS Marketplace.

AWS Marketplace: Loka

</aside>