LARGE AI GRAND CHALLENGE
The LARGE AI GRAND CHALLENGE is now closed!
Missed the LARGE AI GRAND CHALLENGE 1st webinar?
>> See the recording on the AI-BOOST’s YouTube channel and clarify all your doubts!
Missed the LARGE AI GRAND CHALLENGE 2nd webinar?
>> See the recording on the AI-BOOST’s YouTube channel and clarify all your doubts!
Did you go throw the Large AI Grand Challenge page, watched the first and second Webinar recordings and still have doubts? Consult the questions below 👇
Should I identify in my application to which computer (LUMI or LEONARDO) am I applying?
- Yes! You need to clearly identify the target of your application (LUMI or LEONARDO).
Can Large Enterprises participate in the Large AI Grand Challenge?
- No. Only Small and medium-sized enterprises (SMEs) are eligible as defined in the Commission’s Recommendation 2003/361/EC1, which includes startups.
- Yes, the definition of the Commission Recommendation 2003/361/EC on small and medium-sized enterprises (SMEs), includes start-ups. However, please note that the challenge is open only to SMEs with technical capacity AND experience working on large-scale AI models.
Can an SME present a proposal targeting both supercomputers LUMI and LEONARDO?
- No. However, an SME can submit two proposals: one targeting the LUMI supercomputer and another independent proposal targeting the Leonardo’s. Nevertheless, it’s important to note that an individual SME is not eligible to receive both prizes.
What is the expected outcome of the challenge?
- The purpose of the Large AI Grand Challenge is to foster the development of large-scale AI models in Europe.
- The expected outcome of the Large AI Grand Challenge is the selection of up to 4 proposals to create innovative foundational language models that will outperform state-of-the-art systems in a number of relevant tasks. The development of these models should necessarily involve the use of High-Performance Computing (HPC).
- The model must be trained from scratch, possess a minimum of 30 billion parameters, and be trained following state-of-the-art optimal scaling laws for computing and training data size.
- No double financing rule. There is a strict prohibition of double funding for the same action. Actions that have already received an EU prize or Grant cannot benefit from Large AI Grand challenge rewards for the same activities. In particular, actions that have been funded through EuroHPC Access Calls will not be eligible for the LARGE AI GRAND CHALLENGE competition.
- In the case where two grants or prizes are received for the same project or action, beneficiaries must renounce one of them.
- In the event of duplicated funding, it will be deemed a breach and may lead to the withdrawal of funds, as well as potential legal actions. This provision aims to prevent resource duplication and ensure the fair allocation of funding across different projects.
Is the Challenge exclusively for language foundation models (LLM)?
- Yes, the LARGE AI GRANDE CHALLENGE focuses exclusively on large language models, specifically innovative foundational language models. Non-language models are not expected or eligible. Multimodal models are welcomed but are not expected.
What is the IPR ownership of results?
- Overall, the awarding entity does not acquire ownership of the outcomes generated within the framework of the prize. However, please remember to follow the open science approach. Moreover, additional bonuses will be assigned during the evaluation process to proposals that release artefacts, code and/or model weights as open source.
Do I need to prove the operational capacity and experience of the company?
- Yes. The challenge is open only to SMEs with technical capacity AND experience working on large-scale AI models.
Can SMEs from third countries participate?
- Are eligible, Single Legal entities established in one of the eligible countries (consult the guide for applicants to have the complete list of countries).
Should the project be generic or can be industry/sectoral targeted?
- Can be both. However, an innovative approach with potential target uses is very welcomed.
Does the program support projects using Tensor Flow to predict grain commodity output and prices, similar to Google/Deepmind’s AI weather forecasting systems?
- No. The program is for Large Language models. You can apply to any industry that you consider will be valuable. No pre-trained models are expected to be used for language models. You might use additional models or data if you consider.
Is this call open to all industries? Are there preferred industries? Also, do startups have to work with an industry partner to apply to this call?
- The Large AI Grand Challenge is open to all SMEs working with Generative AI with demonstrated technical capacity and experience in large-scale AI models. Consortiums are NOT eligible within this Large AI Grand Challenge, as well as single legal entities that are NOT SMEs (e.g., universities, research centres; NGOs, governmental organisations, large companies, etc.).
- If you work with the industry and can demonstrate an interest in its development for innovation purposes, it will be welcomed but is not mandatory.
Should the developed product be open source?
- Additional bonuses will be assigned during the evaluation process to proposals that release artefacts, code and/or model weights as open source.
- In any case, please keep in mind that Open Science is a legal obligation under Horizon Europe and the Large AI Grand Challenge competition, organised by AI-BOOST and EuroHPC.
Is this based on llama or llama2?
- The models should be trained from scratch. it is not expected to use pre-trained models.
Does the de-minimis rule apply to the grant from this programme?
- No. Minimis rules do NOT apply. However, there is a strict prohibition of double funding for the same action. Actions that have already received and EU prize or Grant cannot benefit from Large AI Grand Challenge rewards for the same activities. In particular, actions that have been funded through EuroHPC Access Calls will not be eligible for Large
AI Grand Challenge competition. - In the event of duplicated funding, it will be deemed a breach and may lead to the withdrawal of funds, as well as potential legal actions. This provision aims to prevent resource duplication and ensure the fair allocation of funding across different projects.
Is it possible to propose improvements to the LLAMA’s architecture?
- No. We expect only models trained from scratch, not pre-trained models.
What kind of data sources will be available and/or allowed to be used?
- No data sources will be available. A description of the data to be used should be provided, as well as a data management plan. All projects should comply with the GDPR and other applicable privacy protection and nondiscrimination rules.
What about LLM for biological (protein, genomics) data? Are foundational models for these kinds of data eligible?
- Foundation Language Models for biological (protein, genomics) data are eligible but the model proposed should be a Foundation Language Model defined as an AI model that is trained on broad textual data using self-supervision (also known as pretraining); contains at least 30 billion of parameters; and is applicable across a wide range of contexts. Other types of AI models that are used for biological data are not eligible.
Are vertically focused language foundational models, for example, coding LLMs, in target with the challenge?
- Yes
Are the competitions really limited to large-scale solutions on central cloud servers? Are solutions based on distributed Edge and Federated learning excluded on purpose?
- The competition will provide direct (bare-metal) access to supercomputing machines rather than through a cloud service. The aim is to train models on supercomputers such as LUMI and Leonardo, gaining experience in their use and opening opportunities for further exploitation of EuroHPC supercomputers by the EU AI community. As a result, scenarios utilising edge or federated computing would only apply to potential data collection, pre-processing, or post-processing tasks, not to the core training tasks.
If the startup does not have all the required expertise, is it possible to assemble a team for the programme?
- The applicants will need to demonstrate technical capacity and experience in working on large-scale AI models as well as be familiar with supercomputers. This also includes the expertise will you include during the project.
Can the computing power be used for tokenising the data or is it limited to training of the models?
- It is up to the applicant to decide how to spend the amount of GPU hours they get. If the model needs tokenizing, you can do that. However, be mindful that there are not unlimited resources, and time is not unlimited either.
Can we start from an existing model (e.g. LLAMA2 or Mistral) by resetting the weights, or is it necessary to start a new architecture?
- The model must be trained from scratch. it is not expected to use pre-trained models.
- A critical component of the proposal will be the inclusion of a substantial differentiating factor compared to existing large language models. This could be accomplished either by introducing innovative enhancements or by devising novel models that effectively address the limitations of current ones.
Could the monetary prize be used to subcontract any tasks to other entities outside the SME?
- If selected, you will receive a monetary prize. It is not a grant. You can use the prize to develop the large-scale AI model described in the application form. See further details on compliance with the guidelines for applicants.
Can we apply with a large-scale general world model?
- Yes. Participants in the Grand Challenge are invited to submit a proposal for the development of a language foundation model, utilising one of the EuroHPC JU targeted facilities (i.e. LUMI or Leonardo supercomputers). The model must be trained from scratch, possess a minimum of 30 billion parameters, and be trained following state-of-the-art optimal scaling laws for computing and training data size.
What is the expected level of project management overheads (reports, calls, documentation,..) over the year?
- Upon completion of the allocation period, the SMEs are required to submit a comprehensive Final Report within three months.
- The SMEs are expected to present the main results at the EuroHPC summit to be celebrated in March 2025 or a similar event.
- See also obligations about communication in guidelines for applicants.
What are the expectations regarding scientific publishing in this project (papers, conferences,…)?
- Open science approach. Moreover, additional bonuses will be assigned during the evaluation process to proposals that release artefacts, code and/or model weights as open source.
- However, results and data may be kept closed if making them public in open access is against the SME’s legitimate interests (e.g., to facilitate commercial exploitation of results). Please if this is the case, it must be explained and justified.
When a company already started to train their own smaller model and is currently on its way, is it still eligible or does it have to start at zero, without any line of code written before the start of the programme?
- Yes, they are eligible.
Does the “from scratch” requirement mean that no work can already have been done when the funds are potentially awarded or just that the model has to be trained from scratch overall?
- It means that the model needs to be new and created by you. You cannot build upon existing models. If you are already in the process of creating it, it’s okay, you can apply.
Do you choose companies from different industries to create a heterogeneous environment during the program or is it independent?
- The proposals will be evaluated independently following the evaluation and selection criteria.
Are multimodal models with a strong focus on language disadvantages compared to pure LLM applications?
- Not at all.
Are there any further benefits beyond the training capacity, prize and publicity that go beyond the cooperation during this one-year time slot? (i.e., co-marketing, community, ecosystem, follow-up funding).
- There will be follow-up funding and the expertise and network gained during the challenge will be a plus.
One of the most challenging tasks for the applicant would be procuring a very large amount of high-quality data. Will there be any support regarding that aspect?
- No. As a matter of fact, a description of the data to be used should be also provided in the application form, as well as a data management plan. All projects should comply with the GDPR and other applicable privacy protection and nondiscrimination rules.
What does it mean, that a startup team needs to have sufficient capacities? How is this measured?
- Applicants are required to demonstrate their team’s expertise in training foundation models and using HPC systems. The challenge is open only to SMEs with technical capacity AND experience working on large-scale AI models.
Is it important to have a business component of the proposal or just scientific?
- See innovation and impact within the selection criteria, among others. This section intends to assess the innovative nature, the potential impacts and contributions of the project. Sustainability and business impact should be described.
Is it possible for 2 different SMEs to apply as a joint venture?
- No.
Do we need to share the developed models in open source?
- Not necessary, if justified.
Is funding provided upfront, and available to be used during the project or it is provided at the end?
- The prizes will be disbursed in a single payment via bank transfer, after the Large AI Grand Challenge ceremony, provided that all the requested documents have been submitted.
Only language models, correct? Image recognition models and others aren’t eligible, correct?
- It might be multimodal, but it should have a language component.
How do we know what the maximum capacity of LUMI or LEONARDO is at a given time?
- Applicants are required to provide a detailed project scope and plan for the development of the large-scale AI model. The proposal should provide a rationale for the utilization of HPC facilities, accompanied by a well-defined plan.
- Projects are expected to use the facilities of one of these two EuroHPC facilities, LUMI and Leonardo supercomputers with a maximum amount of computing hours of 2 million GPU hours per system in total (including pre-processing, parallelisation, experimenting, benchmarking and training). A description of the data to be used should be also provided, as well as a data management plan. All projects should comply with the GDPR and other applicable privacy protection and nondiscrimination rules.
- Therefore, it is important to describe your plan (and capacities) in the proposal, also if there is any limitation of this type. This can be considered.
- Successful applications are notified by email and are contacted by the technical teams of the assigned supercomputer centre for the onboarding procedure.
Can you tell us more about multimodal models – what does it mean exactly that they are “not expected”? What is the requirement for data used to be fit for this challenge?
- We focus on Large Language models, not other type of models (e.g. vision, etc.). The multimodal models will not be evaluated with additional points.
Do all team members have to be on the F6s platform or only one person (the one that applies)?
- One person is enough.
What is the min & max size of the team?
- There is no restriction on the size. However, the applicants must demonstrate that they have the technical capacity AND experience working on large-scale AI models.
- Overall, applicants are required to provide a detailed project scope and plan for the development of the large-scale AI model. They must also offer solid justification for the planned model’s relevance, the justification for the use of HPC, and demonstrate their team’s expertise in training foundation models using HPC systems, as well as the efficient use of the target supercomputers (i.e., LUMI or Leonardo), along with an appropriate plan of the use of the computing time. Consult the guidelines for applicants for further information.
If we have already applied for time in LUMI do we need to give it up?
- Actions that have already received an EU prize or Grant cannot benefit from Large AI Grand Challenge rewards for the same activities. In particular, actions that have been funded through EuroHPC Access Calls will not be eligible for the Large AI Grand Challenge competition.
- In the case where two grants or prizes are received for the same project or action, beneficiaries must renounce one of them.
- In the event of duplicated funding, it will be deemed a breach and may lead to the withdrawal of funds, as well as potential legal actions. This provision aims to prevent resource duplication and ensure the fair allocation of funding across different projects.
How can you apply?
The Large AI Grand Challenge is open from 16 November 2023, 12h CET to 16 January 2024, 17h CET.
Please read and complete the following documents and apply via the F6S platform -> https://www.f6s.com/large-ai-grand-challenge/apply
Still have doubts? Drop us a line > info[@]aiboost-project.eu