We're only laying the foundations: a Digital Summit Conversation
• 4 minute read
Gen AI in 2024
As Gen AI grows in popularity, software engineers are faced with plenty of important questions to consider. These questions extend not only to choosing foundation models, depending on the use case, but also in terms of using the technology responsibly, in respect to transparency and explainability.
Artificial intelligence can trace its roots back to John Von Neumann and Alan Turing in 1950, the year that Turing published his famous article, "Computing Machinery and Intelligence", giving rise to the “Turing test’: the ability to determine whether machines can think like humans. Around a decade ago, advances in convolutional neural networks meant that AI systems could recognise objects, scenes and people (facial recognition). This increased the rate of innovation in areas such as natural language processing via machine translations and transformer architectures.
Now, with Generative AI, which has exploded in to the collective consciousness, the Turing test is close to being passed. Hype Cycles determine the viability of new technologies and how they might evolve over time. Many times, technologies emerge but after the initial hype, interest wanes and they fade away. With Gen AI, that drop-off is not being observed.
This was a view that Jason Richards, Head of Portfolio Technology, Hg shared in discussion with Denis Batalov, Worldwide Technical Leader, AI/ML, Amazon Web Services, at the Hg Digital Summit 2024.
Power of experimentation
With any new technology, it is important to test different tools and experiment with them to develop an understanding. Hackathons and R&D weeks are a good way to approach this within organisations, by giving people the freedom to think creatively and express themselves. This can have a powerful impact and lead to a lot of innovation.
AWS allows its customers to do precisely this. It launched PartyRock (powered by Amazon Bedrock), a free educational tool that lets people build Gen AI applications simply by writing a description of the app’s purpose; the only limit being the individual’s imagination. This illustrates well the power of natural language processing within Gen AI, where people don’t necessarily need to have coding skills.
There has been significant AI evolution. Traditionally, within AI/ML, the approach has been to build specific ML models for specific tasks. E.g. AWS has built and released specific services such as Amazon Translate and Amazon Transcribe which have been designed specifically for speech to text.
Within Gen AI, we have foundation models are able to solve myriad different problems. This is because they are trained on a wide range of data sets and domains. This makes them a powerful proposition. Moreover, by doing prompt engineering, and training the models on an organisation’s own proprietary data, they can be applied to specific problems to solve specific tasks.
In Richards’ view, there is no right or wrong approach to which model will be the most appropriate. Much will depend on experimenting and evaluating different models, depending on an organisation’s particular use case.
“You could start with a foundation model to see how well it works for your specific use case and if it solves it, then great. Then the next question is, how much does it cost? Because it’s a general purpose model it will be larger and require a larger compute to run inference. It may work, in terms of cost benefit, but maybe not; in which case, you might want to explore smaller, more customized models - even specialised hardware - to help with the trade-off between performance and cost,” explained Batalov.
Laying the right foundations
With a rapid expansion in the number of foundation models available for organisations to consider, making the right choice can be daunting. Such is the pace of innovation that it feels like a new, more powerful model gets unveiled every week. This is an important practical problem to overcome. Not only does it need to be best for their business needs, it also needs to be the most cost beneficial. If companies spend substantial operating capital on a foundation model, only for it to translate into a modest revenue uptick, the cost can be hard to justify.
One option for software companies to consider is benchmarking. It is worth noting, however, that model builders often optimise their models for benchmarking. Understanding why models make particular predictions, and whether they were impacted by bias, is vital when choosing the right model for a specific use case or domain.
To overcome the issue of bias, AWS has introduced SageMaker Clarify to enable people to run their own benchmark evaluations on their shortlist of preferred models. Over 21 metrics are available, such as accuracy, robustness, toxicity etc.
“You can do this using your own custom prompts that are specific to your application. The same is available within the Amazon Bedrock ecosystem. We’ve released this as open source on GitHub so people can look at the code and see how these evaluations are done,” Batalov told Richards.
With power comes responsibility
Responsible AI is a function of the model selection process. Are models responding using toxic language because of the training data that was used? Another nuance is stereotyping. If the model is talking about a particular profession, what pronouns is it using? Not to mention factual accuracy when appraising model responses. Explainability is an area of active research. Some models provide explanations but there could be hallucinations that people need to be careful of.
Techniques around chain-of-thought, where models explain step by step what is happening, are also a consideration.
“Transparency is an important consideration. You’ll want to get some information on how the model was trained, what data sources were used, and what controls were in place before using the model.”
Furthermore, the ISO/IEC 42001 standard was recently introduced for companies to establish an Artificial Intelligence Management System.
As the open source community actively contributes to model designs and approaches, one trend to look out for will be the introduction of multi-modal solutions that combine text, code, video, images, audio etc. Large world models, that operate on a non-image, non-text basis – i.e. monitoring machinery to predict outcomes by generating sensory information – are another trend to look out for as the Gen AI Hype Cycle continues building momentum.
Discover more of our conversations from Hg's Digital Summit 2024
Inspiration to Innovation: Ensuring success of AI in your product